A Q-learning approach to waste rock reduction in open-pit mine design based on cleaner production principles

Naser Badakhshan; Ezzeddin Bakhtavar; Kourosh Shahriar; Hassan Khosravi; Sajjad Afraei; Ahlam Maremi

doi:10.1038/s41598-026-35892-w

. 2026 Jan 28;16:6447. doi: 10.1038/s41598-026-35892-w

A Q-learning approach to waste rock reduction in open-pit mine design based on cleaner production principles

Naser Badakhshan ¹, Ezzeddin Bakhtavar ², Kourosh Shahriar ^1,^✉, Hassan Khosravi ³, Sajjad Afraei ¹, Ahlam Maremi ²

PMCID: PMC12909782 PMID: 41606025

Abstract

Large-scale metal mining operations extract vast quantities of ore and waste rock annually, generating both economic benefits and significant environmental challenges. While mining supports industrial growth, technological advancement, and job creation, it also imposes substantial social and ecological costs, particularly from the disposal of waste rock. These mine wastes increase the burden of handling, dumping, reclamation, and long-term monitoring, often undermining sustainability objectives. Reducing waste generation within ultimate pit limit design is therefore essential to align mining with sustainable development policies. This study develops a novel framework that integrates mathematical modeling with Q-learning, a reinforcement learning algorithm, to optimize ultimate pit limit by maximizing ore recovery and profitability while minimizing waste rock extraction. A key innovation of the model is the explicit inclusion of environmental costs, covering prevention, mitigation, and compensation of impacts, as a fundamental component of block economic value. The approach is validated using a large scale copper ore deposit, and compared against the widely used Lerchs–Grossmann algorithm. Results show that the Q-learning framework reduces waste rock extraction by 2.7 million tons, with about 0.5 million tons less ore recovered, while also lowering computational time from 7.2 to 5.8 h. Although Lerchs–Grossmann yields slightly higher profit, it ignores environmental costs, leading to less sustainable outcomes. Overall, the framework prioritizes realistic pit design over superior economic gains. By embedding environmental factors into mine planning, it enhances resource efficiency, minimizes ecological impacts, and promotes cleaner production, thereby advancing sustainability in mining through reinforcement learning-based optimization.

Keywords: Waste rock, Ultimate pit limit, Sustainable mining, Reinforcement learning, Optimization

Subject terms: Engineering, Environmental sciences, Environmental social sciences, Mathematics and computing

Introduction

Mining projects extract vast quantities of ore and waste rock, creating both economic value and environmental challenges¹. Among these, waste rock generation is particularly critical because it increases the burden of handling, dumping, reclamation, and long-term monitoring, often leading to severe ecological risks such as acid mine drainage (AMD) and land degradation. Minimizing waste rock at the mine design stage is therefore central to advancing sustainable mining practices^2,3.

In open-pit mine design, the ultimate pit limit (UPL) defines the maximum economically mineable extent of a deposit, considering long-term parameters such as ore grades, market prices, mining and processing costs, and slope stability constraints^4–6. The UPL represents the outer boundary of feasible mining and serves as the foundation for subsequent stages of mine planning, including production scheduling, waste dump allocation, and processing capacity planning^7,8. Since UPL determination influences resource utilization, waste generation, and overall project feasibility, it is regarded as a critical decision in mine design⁷.

Over the past decades, numerous approaches have been developed for UPL optimization. Traditional methods, most notably the Lerchs–Grossmann (LG) algorithm, provide mathematically rigorous solutions but often assume static economic conditions and ignore environmental factors. It should be explicitly noted that the classical LG algorithm is formulated as a single-objective optimization method that maximizes economic value based on revenues and operational costs, and does not natively incorporate environmental externalities unless such costs are manually embedded into block values. Alternative approaches such as linear programming, maximum-flow algorithms, and heuristic or metaheuristic techniques, including genetic algorithms and graph-based methods, have been explored to improve efficiency and adaptability^9–14. More recently, stochastic and sustainability-oriented frameworks have emerged, aiming to incorporate uncertainty in metal prices and ecological considerations into UPL determination^15–20. Table 1 summarizes selected contributions in this field.

Table 1.

Summary of the literature of determining open-pit limits.

Reference/year	Research focus	Orebody/mining scale	Key specifications/characteristics
Marquina Araujo et al.¹⁵	UPL using Pseudoflow	Copper/small	Demonstrates efficient pit-boundary delineation and consistency with benchmark methods; performance moderated by problem simplification and solution time
Holloway¹⁶	UPL under risk conditions	Copper/small	Integrates explicit risk quantification, enhancing selection reliability; computational demand increases for large-scale implementations
Azadi et al.¹⁷	UPL using GA	Hypothetical/small	Provides rapid, scalable solution generation for large datasets; yields approximate solutions that remain sensitive to model simplifications
Mwangi et al.¹⁸	UPL using maximum-flow algorithm	Gold-copper/small	Produces consistent pit values with 12–16% reduction in computation time; retains approximation inherent to maximum-flow formulations
Chatterjee et al.¹⁹	UPL under price uncertainty	Iron/large	Highly effective in volatile market conditions, achieving notable NPV improvement; scenario-level modelling imposes significant computational complexity
Adibi and Ataee-pour²⁰	Sustainability-oriented UPL	Iron/medium	Embeds sustainable development principles, generating larger and more sustainability-aligned pits; environmental modelling simplified relative to full LCA frameworks
Rimélé et al.²¹	Stochastic optimization under grade and price uncertainty	Copper/large	Captures both grade and price uncertainty; suitable for large copper systems, though computational complexity escalates with increased block and time-step resolution

Open in a new tab

Despite advancements in UPL optimization, two critical gaps remain. First, most methods focus solely on maximizing economic returns, overlooking environmental costs such as pollution, ecosystem degradation, and biodiversity loss^20,21. Second, existing methods often lack adaptability to dynamic mining environments, where ore grades, commodity prices, and environmental regulations continuously evolve²². Some studies have highlighted the feasibility of integrating ecological costs into mine planning, for instance by internalizing carbon emissions and ecosystem service losses²³ or incorporating carbon pricing into long-term planning²⁴. However, these approaches primarily address production scheduling rather than early-stage UPL design. Machine learning, particularly reinforcement learning (RL), offers promising potential for addressing these gaps by enabling adaptive decision-making under dynamic conditions, optimizing economic, technical, and environmental outcomes simultaneously.

In recent years, RL has attracted significant interest in the mining sector due to its capacity to improve operational efficiency, reduce risks, and enhance safety. The breadth of RL applications in mining highlights its role in optimizing operations, predicting and managing risks, improving safety, streamlining supply chains, and analyzing large volumes of sensor-based data. These diverse applications demonstrate RL’s flexibility and the growing potential to integrate it across the mine life cycle^8,25–28. Importantly, RL does not merely improve efficiency but also contributes to cleaner production by minimizing waste, preventing accidents, and enabling more sustainable resource utilization.

This study introduces a novel RL-based framework for UPL optimization that explicitly integrates environmental costs into block economic values, ensuring that pit designs are both economically viable and environmentally responsible. The main contributions of this research are threefold: first, the development of a tailored RL framework that dynamically balances ore recovery, profitability, and waste rock minimization, leveraging RL’s adaptive capabilities to handle complex, evolving mining conditions; second, the explicit integration of environmental costs—including prevention, mitigation, and compensation for ecological impacts into block economic values, aligning UPL design with cleaner production principles; and third, the validation of the proposed approach on a large-scale open-pit mine and benchmarking against the widely used LG algorithm.

By jointly addressing economic and environmental objectives, this research demonstrates how reinforcement learning can extend UPL optimization beyond purely profit-driven formulations and support cleaner production in open-pit mining. The proposed approach provides a pathway for reducing waste rock generation, improving resource efficiency, and aligning mine design with long-term sustainability objectives.

The determination of the ultimate pit limit is a foundational step in open-pit mine design and has traditionally been treated as a static, single-objective optimization problem solved using the Lerchs–Grossmann algorithm. While the LG algorithm is mathematically rigorous and effective under deterministic economic assumptions, it optimizes pit geometry solely based on block economic values and does not natively incorporate environmental costs such as waste rock management, land disturbance, or long-term remediation liabilities. This structural limitation restricts its ability to address the multi-objective nature of modern mining, where economic performance must be balanced with environmental responsibility.

To overcome these limitations, this study redefines UPL optimization as a policy-learning problem rather than a static profit contour. A reinforcement learning framework based on Q-learning is proposed, in which environmental costs are explicitly internalized within the block economic value and embedded directly in the reward structure of the optimization process. As a result, the learned extraction policy represents a balanced compromise between profitability and sustainability, rather than a purely economic optimum.

Methodology

Open-pit mine design is constrained by technological, geological, and economic factors, meaning that only part of a deposit’s geological resource can be extracted profitably. This portion is defined by a three-dimensional geometry known as the UPL, which establishes the maximum economic boundary of the deposit. Determining the UPL is a fundamental step in mine planning and design because it sets the framework for subsequent design tasks, including production scheduling, waste dump allocation, processing plant sizing, and infrastructure placement²³. The UPL therefore plays a decisive role in both the economic feasibility and the environmental footprint of a mining project.

This research proposes a novel methodology that employs RL to determine the UPL as outlined in Fig. 1. RL, a subfield of machine learning, enables an agent to interact with its environment and gradually learn optimal strategies through feedback in the form of rewards or penalties. By continuously refining its strategy, the agent improves decision-making over time. Applied to UPL optimization, RL offers significant advantages over traditional methods: it adapts dynamically to changing conditions, incorporates multiple objectives, and allows the explicit inclusion of environmental costs alongside economic performance. Table 2 summarizes the fundamental concepts of RL that form the basis of this methodology^29–32.

Table 2.

Critical concepts in RL.

Row	Item	Description
1	Agent	The agent can be a robot, software, or any entity that performs actions to achieve a goal
2	Environment	The environment includes all the conditions and data with which the agent interacts
3	Action	Actions are the steps the agent takes in each state that affect the environment
4	State	A state defines the specific situation of the environment at a given moment
5	Reward	After each action, the agent receives a reward indicating the quality of that action relative to the final goal
6	Policy	A policy is the strategy that specifies the action to take in each state
7	Learning from experience	Through repeated interaction, the agent learns from experience and improves its policy over time

Open in a new tab

The proposed Q learning framework determines the sustainable UPL by integrating mining economics with essential geotechnical and environmental constraints. A fixed 45 degree slope angle serves as a key geotechnical input, controlling pit stability and the volume of material that can be extracted profitably. This constraint, together with the geological block model and economic and environmental parameters, defines the agent’s feasible actions and governs state transitions³³. The optimization process follows an iterative learning cycle in which the Q table is initialized and updated using the learning rate (α), discount factor (γ), and exploration rate (ε). At each step, the agent evaluates the current pit geometry, selects a block extraction action (a) through an ε greedy policy, and receives a reward (r) based on both economic value and adherence to slope stability requirements. Q values are then updated using the Bellman based rule, gradually improving the agent’s policy as it interacts with the environment³⁴. This iterative process continues until convergence, after which the final UPL is extracted from the learned policy by identifying the sequence of actions that produces the highest cumulative discounted reward.

In this study, the RL framework is applied to the UPL problem by explicitly modeling the sets, parameters, and decision variables that capture the economic and environmental dimensions of open-pit mining. The methodology defines the objective function and constraints within an RL structure to guide the agent toward identifying the pit configuration that maximizes resource recovery while minimizing waste extraction and environmental costs.

Within this framework, Q-learning is selected as the reinforcement learning approach based on both methodological and practical considerations. The UPL problem is inherently discrete and deterministic, characterized by a finite and well-defined state–action space derived from the geological block model and slope constraints. Under such conditions, tabular Q-learning is particularly suitable, as it allows direct representation of state–action values without reliance on function approximation. Compared to deep reinforcement learning approaches such as Deep Q Networks or policy-gradient methods, Q-learning offers greater algorithmic transparency and numerical stability, enabling direct inspection and interpretation of learned policies. This interpretability is especially important in strategic mine design, where decision traceability and robustness are critical for engineering validation and stakeholder confidence. Furthermore, Q-learning involves lower computational complexity and reduced sensitivity to hyperparameter tuning, making it more practical for large-scale mine design applications in which explainability, stability, and repeatability are prioritized over black-box performance gains.

The model assumes that a reliable geological block model is available, and a 45° slope angle is imposed to ensure geotechnical stability³⁵. Geological and economic uncertainties are excluded to focus on the methodological contribution of RL. The basic decision-making units in this framework are block mining units (BMUs), which represent the three-dimensional blocks into which the deposit is divided. For the case study, the BMUs have dimensions of X = 25, Y = 25, Z = 12.5.

Table 3 summarizes the indices, sets, parameters, and decision variables employed in the RL-based UPL model. This structured formulation ensures clarity in representing the complex interrelationships between economic value, environmental cost, and operational feasibility.

Table 3.

Symbols used in the RL-based UPL determination model.

Row	Index	Symbol
1	Row index of mining blocks in the x-direction	i {1, …, I}
2	Row index of mining blocks in the y-direction	j {1, …, J}
3	Row index of mining blocks in the z-direction	k {1, …, K}
4	Block position index	i, j, k
	Set
5	Set of open-pit blocks (mining units)
6	Set of open-pit blocks at horizon L
7	Set of for each open-pit bock (i, j, k) which defines the previous blocks that should be mined before block i, j, k is mined. Here, Z denotes the total number of blocks in set
8	Set of immediate previous variables	Z =
9	For each horizon, set , which determines the total number of mining blocks (workshops) accessible for BC mining at that horizon. Here, S is the total number of blocks (workshops) in
	Parameter
10	Income of selling open-pit product in block i,j,k mining cost of block i, j,k as the ore and its processing
11	Open-pit environmental cost in block i, j, k (ore + waste) per ton
12	Waste disposal cost of mining block i, j, k under open pit
13	Average ore grade in blocks i, j, k under open pit
14	Ore tonnage of block i, j, k (mineralized materials) under open pit
15	Waste tonnage of mining block i, j, k (non-mineralized materials) under open pit
16	Recovery rate
17	Selling price of minerals
18	Selling cost of minerals
19	Overhead per ton of ore for extraction and mining
20	Cost per bench or horizon of extracting one ton of ore using open pit
21	Processing cost of one ton of ore
22	Processing cost of ore in block i, j, k for open pit
23	Environmental cost of one ton of extracted ore using open pit
24	Environmental cost increment factor per bench
25	Royalty
26	Cost of extracting one ton of ore using open pit
27	Learning rate (the amount of impact of new learning on the Q-value)
28	Discount factor (a measure of the importance of future rewards compared to immediate rewards)
29	Probability of selection a random action (exploration)
	Variable
30	Current state	s
31	Current action taken
32	Reward received after action	r
33	Previous state
34	Previous action taken

Open in a new tab

In this research, the RL framework is structured around six key elements: environment, agent, state, action, policy, and reward ^35,36. These elements allow the algorithm to continuously interact with the mining system and adapt toward more sustainable and profitable pit designs.

Environment

The environment represents the mining system, modeled as a set of geological blocks. To determine the UPL, each block is assigned an economic value that integrates revenues and costs. These costs include not only mining, processing, and refining expenses but also environmental costs, which are often excluded in conventional methods such as LG. The block economic value (BEV) is calculated through Eq. 1.

Revenues ( Inline graphic ) are derived from ore tonnage, grade, recovery rate, and mineral selling price, offset by overhead costs using Eq. 2. Environmental costs (), through Eq. 3, are modeled per ton of ore and waste and adjusted for each mining horizon () in Eq. 4 to account for increasing impacts at greater depths. Mineral processing costs ( Inline graphic ) and extraction costs () are also explicitly calculated through Eqs. 5 and 6, respectively.

This formulation allows for the inclusion of prevention, mitigation, and compensation costs related to environmental impacts such as land degradation, biodiversity loss, and pollution. Although this reduces short-term profit, it produces pit designs that are closer to reality and aligned with sustainable development objectives. This approach is based on the framework presented in reference⁵, which systematically quantifies environmental costs throughout the mining life cycle (Table 4).

Table 4.

Steps for calculating environmental costs in UPL determination.

Step	Stage	Description
First	Identifying cost factors	Costs depend on human development index (HDI), mine scale, proximity to populations, mining method, mineral type, and ecological sensitivity
	Mining activities	Activities across exploration, construction, operation, and closure are assessed for potential environmental harm
	Environmental components	Impacts are evaluated on air, water, land, ecology, and socio-economic systems
Second	The environmental costs of mining: Here, C_E denotes the environmental cost of extracting one ton of ore ($/ton). The term C_mn reflects the cost of adverse impacts arising from mining activity m on environmental sector n, while U_mn represents the degree of uncertainty associated with this impact. In this model, F_HD corresponds to the Human Development Index (HDI), F_MS indicates the mining scale, F_LM refers to the mine’s proximity to nearby settlements, F_MM captures the selected mining method, F_TM identifies the mineral type, and F_EES accounts for the environmental and ecosystem sensitivities of the mining area	A mathematical relationship incorporating uncertainty estimates costs of prevention, mitigation, and compensation. This accounts for multiple mining activities and affected environmental components. In this approach, the environmental cost assigned to one ton of ore is considered equal to that of one ton of waste in open-pit mining
Third	Quantitative assessment	Applied to the Sarcheshmeh Copper Mine case, with scenarios compared against expert assessments and global benchmarks

Open in a new tab

This framework ensures that environmental costs are embedded at the earliest stage of mine design, strengthening alignment with cleaner production and sustainable mining policies. The environment for the RL problem consists of the entire set of mining blocks, which can be represented as a 2D (experimental) or 3D (realistic) area.

Agent

The agent serves as the decision-maker, selecting blocks to be mined at each step. It learns by testing pit configurations, receiving rewards based on their profitability and sustainability, and refining its strategy over time.

Action

In a 2D representation, an action is expressed as a vector where each entry corresponds to the depth of excavation for a column of blocks. An example of a 2D action vector is shown in Fig. 2.

Fig. 2 — Hypothetical example of an action in a 2D mining environment.

In a 3D environment, an action is represented as I × J matrix where each cell contains a number between 0 and K. The entry Inline graphic specifies the blocks to be removed. A sample matrix illustrates operations in a realistic 3D environment with a length of 5 blocks and a width of 4 blocks.

Policy

The policy defines the decision rule the agent follows to maximize rewards. In this context, it seeks to maximize cumulative profits while respecting slope stability constraints. The objective function is expressed as Eq. 7

To maintain geotechnical stability, slope constraints are applied such that any block can only be mined if all overlying blocks within the 45° slope angle have first been removed. In a 3D environment, the action vector must satisfy Eqs. 8, 9, 10. Figure 3 illustrates block B precedence requirement, where deeper blocks can only be accessed after clearing their supporting overburden.

Fig. 3 — Block mining precedence under slope constraints in open-pit mining.

The relationships governing block removal in a 3D environment are expanded in Eqs. 11–19.

Reward and punishment:

Rewards and penalties for an agent’s actions are normalized between -1 and + 1. Ideally, all blocks with positive economic value (BEV⁺), considering the environmental costs, are mined to achieve maximum profit. The normalized reward is calculated using Eq. 20.

To enforce compliance with the slope constraint, a negative reward is added to Eq. 20, as defined in Eq. 21:

Using this framework, the problem is solved with the Q-learning algorithm, a widely used RL technique for solving tabular learning problems. This algorithm allows the agent to learn the best strategy from experience without requiring an explicit model of the environment. Key concepts in Q-Learning are as follows:

i. Q-function.

The Q-function represents the expected reward for a specific action in a given state, denoted as Q(s,a). It quantifies the quality of action a in state s.

ii. Q-Values update.

The Q-values are updated using a learning formula, which is represented in Eq. 22.

Iii. Actions and states.

At each state s, the agent selects an action a. This choice can be guided by an ε-greedy strategy, balancing exploration and exploitation.

iv. Extraction strategy.

Once the Q-values converge to acceptable levels, the resulting strategy is extracted by selecting the actions with the highest Q-values³⁷.

Results and discussion

To evaluate the proposed RL-based UPL optimization framework, computational tests were performed on both hypothetical 2D and 3D mining scenarios and a large-scalereal-world copper mine. The model was implemented in Python 3.13 and executed on a system with a 7th Gen Intel® Core™ i7-7800G7 processor @ 2.40 GHz. Economic, technical, mining, and processing parameters for the UPL calculations were based on hypothetical copper mine data (Table 5). All monetary values are expressed in US dollars. All parameters in Table 5 are defined in Table 3.

Table 5.

Economic and technical data for determining the UPL in 2D and 3D scenarios.

Row	Parameters	Unit	Value
1	and	$	Specified for blocks
2	and .	$	Specified for blocks
3	and	Ton	Specified for blocks
4	and	%	Specified for blocks
5	and	Ton	Specified for blocks
6	and	Ton	Specified for blocks
7		%	0.97
8		$	9500
9		$	650
10		$	0
11		$	0
12		$	5.6
13	and	$	Specified for blocks
14		$	0.002
15		$	0
16		$	10% added value
17		$	2.25
18		–	0.9
19		–	0.5
20		%	0.2
21	I	Number	21(2D)–15 (3D)
22	J	Number	0 (2D)–15 (3D)
23	K	Number	10(2D)–7 (3D)

Open in a new tab

Hypothetical 2D example

A two-dimensional geological block model consisting of 21 × 10 blocks was first analyzed to test the efficiency and learning behavior of the RL-based solver. In this hypothetical example, each block represents a mining unit with dimensions of 25 × 25 m. The graded geological matrix is shown in Fig. 4, while the corresponding BEV, calculated using Eq. 1 and the parameters in Table 5, is illustrated in Fig. 5.

Fig. 4 — Fundamental input models for Q-Learning framework: (A) hypothetical 2D block model, (B) BEVs.

Fig. 5 — Training stages of the RL-based solver in 2D after (A) 10 steps, (B) 100 steps, (C) 1000 steps, and (D) 10,000 steps.

Figure 4 is based on a hypothetical 2D geological block model representing a vertical cross-section of the deposit. The model comprises blocks of specified dimensions, with each cell indicating the average ore grade (%) for that block. This model serves as the primary geological input for the optimization.

Figure 4A shows the computed BEV model derived from the geological data in Fig. 4B and the economic/environmental parameters in Table 5. The BEV for each block is calculated using Eq. 1, which incorporates environmental costs. Green-colored blocks represent a positive BEV (economically viable to mine), while red-colored blocks represent a negative BEV (not economically viable when environmental costs are internalized).

During the solution process, the RL-based solver was trained at step counts of 10, 100, 1,000, and 10,000. The training progression at these intervals is illustrated in Fig. 5A–D. Results indicate that as the number of training steps increases, the model converges closer to the ultimate pit configuration. At 10,000 steps, the framework successfully identified a stable solution with a calculated profit of $3,036,605, corresponding to an ultimate pit depth of 9 levels.

Hypothetical 3D example

A hypothetical three-dimensional geological model consisting of 15 × 15 × 7 blocks was constructed to extend the validation of the RL-based framework under more realistic conditions. In this example, each block represents a mining unit with dimensions of 25 × 25 × 12.5 m. Using Eq. 1 and the parameters in Table 5, the BEV model of the deposit was generated. A schematic representation of the resulting 3D mine space is provided in Fig. 6, where yellow blocks denote positive BEV (economically viable with environmental costs included) and blue blocks represent negative BEV (uneconomic or environmentally burdensome).

Fig. 6 — Schematic representation of the 3D mine space (yellow: positive BEV, blue: negative BEV).

To evaluate the model’s learning progression, the algorithm was trained at increasing step counts (10, 100, 1000, 10,000, and 100,000). Figure 7 depicts the evolution of extracted blocks across these training stages. As shown, the model consistently converges toward the optimal pit configuration as the number of steps increases. After 100,000 steps, requiring approximately one hour of computation on the test hardware, the RL framework achieved a stable and acceptable solution. The calculated profit from ore extraction was $11,450,691, with the UPL corresponding to a depth of 7 levels.

Fig. 7 — Blocks to be extracted over different steps (blue: waste, yellow: graded blocks, red: extracted blocks).

The staged progression of the RL model clearly illustrates how the agent incrementally refines its policy over time. At the earliest stages, with only 10 to 100 training steps, the algorithm demonstrates exploratory behavior. Pit boundaries at this point appear irregular and far from optimal, reflecting the agent’s trial-and-error learning process. As the training increases to 1000 and then 10,000 steps, a coherent pit structure begins to emerge, with noticeably improved alignment to both economic and geotechnical constraints. Finally, after 100,000 steps, the pit geometry stabilizes, reflecting convergence to a near-optimal solution. At this stage, the model maximizes net profit while simultaneously minimizing unnecessary waste removal. This dynamic highlights one of RL’s major advantages over deterministic algorithms: instead of following a single fixed optimization path, the agent adapts through iterative feedback. This allows it to continuously balance ore recovery, waste minimization, and slope stability in a flexible and dynamic manner.

To ensure that the solutions produced by the algorithm were not only economically and environmentally sound but also technically feasible, a rigorous slope validation process was embedded within the framework. After each full iteration, the calculated pit walls were compared against the predefined 45° slope angle. If any slopes were found to exceed this threshold, the algorithm automatically identified these deviations and flagged them for review. The system then reverted to the previous iteration and re-weighted decision parameters, such as block priorities or penalty terms, to correct the issue. This validation cycle continued iteratively until all slopes complied with the established geotechnical constraints, ultimately resulting in a geometrically valid ultimate pit. By embedding slope control as a hard constraint within the RL reward function, the algorithm ensured that the final pit geometries consistently satisfied both geotechnical stability and sustainability requirements.

Validation of hypothetical example results

To initially validate the proposed RL-based approach for determining the UPL, its performance was benchmarked against the well-established LG method. The comparative results are presented in Table 6. Both methods produced nearly identical profits, with the RL approach yielding only 0.15% less than the LG method. This minimal difference highlights that the RL model is competitive with the industry standard in terms of economic outcomes.

Table 6.

Results validation based on the hypothetical example.

Method	Total profit ($)	Stripping ratio	Ore (Ton)	Waste rock (Ton)	Average grade (%)
RL-based approach	1,027,983	4	2,182,750	8,775,000	0.41
Lerchs-Grossmann	1,029,484	4.12	2,193,750	9,039,062	0.39

Open in a new tab

Where the RL method demonstrates a clear advantage is in efficiency and environmental impact. The stripping ratio was reduced by approximately 2.91%, indicating lower waste rock removal for the same ore tonnage extracted (2,193,750 tons). This corresponds to a 2.92% decrease in waste rock tonnage, from 9,039,062 tons Lerchs–Grossmann to 8,775,000 tons (RL), underscoring a notable improvement in environmental performance.

These findings demonstrate that the RL-based approach can achieve comparable profitability while producing more sustainable pit designs with lower waste extraction. Although the differences in this hypothetical case are modest, they highlight the method’s capacity to balance economic and environmental objectives, which is crucial when scaling to larger and more complex deposits.

A large-scale copper mine

The Sarcheshmeh Copper Mine, strategically positioned in Kerman Province approximately 160 km southwest of Kerman City, represents Iran’s premier copper mining operation and a geologically significant porphyry copper deposit (Fig. 8). Situated within the central segment of an extensive NW–SE trending orogenic belt, the mine exhibits characteristic complex folded volcano-sedimentary sequences associated with Late Tertiary magmatic-hydrothermal systems. The deposit genesis is fundamentally linked to Miocene-Pliocene intrusive events, with copper mineralization demonstrating strong spatial and genetic relationships with specific phases of these intrusive complexes³⁸.

Fig. 8 — Location and operation of the Sarcheshmeh copper mine.

The current operational configuration employs advanced open-pit methodology with precisely engineered geotechnical parameters: working benches maintain 12.5 m heights with 62.5° slope angles, while overall pit slopes are optimized between 32 and 34°¹⁹. Lithologically, the sequence is dominated by fine-grained andesite porphyries, with Eocene andesites representing the basal host rock succession. The primary mineralized unit comprises the Sarcheshmeh granodiorite stock, while waste rock assemblages predominantly consist of varietal granodiorite dike complexes exhibiting porphyritic hornblende, feldspar, and biotite differentiates (Fig. 9)³⁹.

Fig. 9 — Geological map of the Sarcheshmeh copper mine area; (a) East–west cross section view, (b) Plan view.

This investigation focuses specifically on drilling operations within the western mining sector, an area characterized by exceptional lithological heterogeneity comprising four principal units: Sarcheshmeh Porphyry (SP), Late Fine Porphyry (LF), Hornblende Porphyry Dike (HD), and Andesite (AN). Each lithological unit displays distinct geochemical signatures, alteration mineralogy, and metallurgical characteristics necessitating specialized geometallurgical modeling and processing strategies. The complex structural framework, featuring gently westward-plunging folds affecting volcanic and conformable sedimentary sequences, with a significant Late Tertiary intrusion positioned proximal to the anticlinal fold axis, creates the fundamental structural controls for mineralization distribution and requires sophisticated geomechanical analysis for optimal mine planning and design³⁸.

Given its scale, longevity, and socio-economic importance, Sarcheshmeh serves as an ideal case study for evaluating innovative mine design methods. The application of RL to this deposit provides a rigorous test of the model’s ability to handle complex geological, technical, and economic conditions. Moreover, the mine’s environmental footprint is large in scale, including waste rock generation, acid mine drainage potential, and land disturbance. This case underscores the necessity of incorporating sustainability and cleaner production principles directly into UPL optimization.

The results, summarized in Table 7, provide a direct comparison between the RL-based method and the LG algorithm under real-world conditions. It is important to emphasize that the profit reported for the LG-based pit is derived under conventional economic assumptions and does not account for environmental costs associated with waste rock extraction, land disturbance, and long-term rehabilitation.

Table 7.

Results validation based on Sarcheshmeh large-scale copper mine.

Method	Total profit ($)	stripping ratio	Ore (Ton)	Waste rock (Ton)	Average grade (%)
RL-based approach	31,383,382,232	1.5388	1,758,784,621	2,706,483,421	0.412
Lerchs-Grossmann	31,429,206,392	1.5400	1,759,270,715	2,709,276,901	0.411

Open in a new tab

The comparison between the RL-based framework and the Lerchs–Grossmann algorithm indicates that the proposed approach achieves a total profit very close to that obtained using the conventional method, with a difference of approximately 0.15%. This difference should be interpreted in light of the fundamental objective of the proposed methodology. Rather than maximizing short-term economic return alone, the RL-based framework explicitly prioritizes waste rock reduction by internalizing environmental costs directly into the block economic value. As a result, certain marginal blocks that appear profitable under purely economic assumptions may become uneconomic once environmental impacts are considered.

Consequently, the marginally lower profit obtained using the RL-based approach does not represent a methodological weakness or loss of efficiency. Instead, it reflects a more realistic and sustainability-aligned valuation of the mining project at the design stage. This outcome highlights the deliberate trade-off embedded in the proposed framework, whereby slight reductions in conventional economic indicators are accepted in exchange for improved environmental performance and cleaner production outcomes. In contrast, the Lerchs–Grossmann algorithm optimizes pit geometry under purely economic assumptions and does not natively incorporate environmental costs associated with waste rock management, land disturbance, or long-term remediation liabilities.

One of the most critical advantages of the RL-based approach lies in its ability to reduce environmental burdens while maintaining comparable resource recovery. In the Sarcheshmeh case, the RL framework reduced total waste rock extraction from 2.709 to 2.706 billion tons, while preserving nearly identical ore recovery, with a deviation of less than 0.03%. This result demonstrates that the proposed framework can achieve measurable reductions in waste removal without materially affecting the amount of recoverable ore.

From a mine design perspective, this reduction reflects a more selective and compact ultimate pit geometry arising from the explicit internalization of environmental costs within the optimization process. Marginal blocks located at the periphery or deeper extents of the pit, which remain economically viable under purely profit-driven optimization, are excluded when environmental impacts are accounted for. This leads to pit designs that better balance economic performance with long-term environmental responsibility, reinforcing the practical value of integrating sustainability considerations directly into ultimate pit limit determination.

From an environmental and geochemical perspective, reducing waste volume contributes directly to better AMD control. Lower quantities of exposed waste rock mean reduced surface area for oxidation of sulfide minerals, ultimately minimizing the generation of acidic runoff and metal leachate³⁸. By curbing AMD risk at the mine design stage, the RL approach not only supports cleaner production but also lowers future costs associated with water treatment, tailings stabilization, and environmental remediation.

The proposed mechanism for reducing the stripping ratio and waste rock volume operates through a fundamental adjustment to the BEV calculation. In our methodology, the BEV incorporates not only traditional revenue and operational costs but also an environmental cost per ton for both ore and waste rock (E_ijk), as defined in Eq. 1. This inclusion elevates the profitability threshold for block extraction, causing numerous marginal blocks, which register a positive BEV under the LG method to acquire a negative BEV due to the significant environmental costs associated with waste handling. Since these marginal blocks are typically situated at the deeper extents and peripheries of the pit, their exclusion from the final design yields a shallower and more compact ultimate pit geometry. This strategic modification directly translates into a substantial reduction in the total volume of waste rock that must be handled, thereby lowering the overall stripping ratio. This outcome represents a conscious and optimized trade-off, accepting a marginally lower net present value in exchange for significant environmental benefits and reduced long-term liabilities. It is critical to note that “waste rock” in this context refers specifically to non-mineralized or sub-economic material requiring removal to access ore, distinct from mill tailings⁴⁰. In addition to these environmental advantages, the RL-based model demonstrated superior computational efficiency, completing the optimization in approximately 5.8 h a 20% reduction compared to the LG method’s 7.2 h. While absolute computation time alone is an imperfect metric for direct comparison due to its dependence on specific hardware and implementation details, this marked increase in speed enables more rapid scenario analysis and enhances responsiveness to dynamic real-world changes in commodity prices, policy constraints, and stakeholder expectations, making it highly advantageous for modern, adaptive mine planning⁴¹.

Crucially, the RL framework goes beyond simple profit maximization by internalizing key environmental costs, such as waste disposal, land disturbance, and rehabilitation, into the block economic value. This results in a more holistic and realistic evaluation of project sustainability. What appears as a slight reduction in short-term profit can, in reality, translate to long-term financial and social gains by reducing waste handling expenses, improving closure outcomes, and enhancing the social license to operate.

When environmental considerations and sustainable development are priorities: The primary mechanism of RL’s superiority lies in the direct integration of environmental costs into the BEV. This integration renders the extraction of marginal blocks uneconomical. These blocks, while offering low net economic benefits, incur high environmental costs due to the associated waste rock removal. Consequently, the resulting ultimate pit is more “realistic” and sustainable.

In dynamic environments with high uncertainty: The RL method is inherently designed for adaptability. If parameters such as metal prices or environmental costs change over time, the trained agent can be rapidly fine-tuned with new data. In contrast, the LG method requires a complete re-execution from scratch under changed conditions.

Overall, while the LG method may produce a marginally higher profit under conventional economic assumptions, the RL-based approach offers a better-aligned solution with the principles of sustainable mining, emphasizing environmental responsibility, economic realism, and computational efficiency. The validation at Sarcheshmeh confirms that the RL framework is not only technically and economically viable but also environmentally and ethically superior, positioning it for large-scale open-pit mine designing in the era of cleaner production and sustainable development.

Superiorities and limitations

The results across both hypothetical and real-world cases highlight several superiorities of the RL framework over traditional methods such as LG.

First, RL demonstrates progressive convergence and adaptability. Through incremental training, the agent refines its strategy, moving from exploratory solutions to stable, near-optimal pit geometries. This dynamic learning process enables RL to balance ore recovery, waste minimization, and slope stability more effectively than deterministic algorithms, which follow a single optimization path.

Second, RL naturally supports cleaner production principles. By embedding environmental costs into block economic values, the framework inherently disfavors extraction of marginal or environmentally costly blocks. The resulting pit geometries are typically shallower and more efficient, leading to reduced waste rock volumes, shorter haulage distances, lower fuel consumption, and decreased greenhouse gas emissions. In this sense, RL simultaneously improves both economic and ecological performance.

Third, the method provides flexible decision support for mine planners. Because the RL agent adapts to changing input parameters, it is capable of incorporating dynamic variables such as commodity price fluctuations, operational cost changes, or evolving environmental regulations. This adaptability reduces the need to restart optimization processes when conditions change, saving significant time and resources in practice.

Nevertheless, certain limitations must be acknowledged. The RL approach requires substantially more training time in smaller-scale or experimental problems compared to LG, which produces exact solutions more quickly in such cases. Additionally, convergence is not guaranteed to the global optimum, and solution quality depends on the number of training steps and algorithmic parameter settings. These characteristics introduce uncertainty in determining when sufficient training has been achieved.

Despite these challenges, RL’s scalability and transferability make it exceptionally promising for large-scale, real-world mining applications. In such contexts, the method’s ability to integrate sustainability objectives, reduce computational effort over repeated runs, and deliver more balanced outcomes strongly outweighs its drawbacks. Future research should focus on reducing training time through algorithmic refinements and expanding the integration of social and ecological indicators, thereby strengthening RL’s role in advancing sustainable mine design.

Conclusions

Open-pit mining operations create considerable economic value but also introduce serious environmental challenges, most notably, the management of waste rock, which directly influences ecological stability, rehabilitation complexity, and long-term liabilities. This research tackled these concerns at the design level by embedding environmental costs into UPL optimization through a RL framework based on Q-learning. Unlike conventional methods that prioritize profit maximization, the proposed RL approach redefines block economic values to internalize the costs of prevention, mitigation, and compensation, resulting in a design philosophy that aligns with both economic efficiency and environmental stewardship. The comparative validation confirmed that the RL framework produces pit designs that are economically competitive with industry-standard algorithms while offering distinct advantages in environmental performance. It consistently reduced unnecessary waste extraction, achieved cleaner ore profiles, and delivered outcomes that are more compatible with sustainable development goals. These findings demonstrate that even modest improvements in material movement translate into substantial environmental benefits when applied to large-scale operations, particularly by lowering the long-term liabilities of waste disposal, land disturbance, and acid mine drainage management. Another significant finding is the computational efficiency of the RL-based approach. By converging to valid solutions more quickly than the Lerchs–Grossmann algorithm, it enhances the ability of planners to explore multiple design scenarios, adapt to changing market conditions, and respond to evolving environmental regulations. This agility is increasingly vital in an industry where economic volatility and regulatory expectations demand rapid and robust decision-making. Perhaps the most important outcome of this research is the demonstration that profit alone is not a sufficient indicator of project viability. Designs that ignore environmental costs may appear more profitable in the short term, but they underestimate future liabilities and misrepresent long-term sustainability. The RL framework, by internalizing these costs into early-stage design decisions, provides a more realistic and balanced measure of project value. This shift not only improves long-term economic resilience but also strengthens social license to operate by aligning with community expectations and global sustainability standards. By reducing waste, improving computational efficiency, and embedding environmental considerations into design, the RL framework advances the practice of mine planning toward cleaner production and greater ecological responsibility. Future research should focus on refining algorithmic efficiency, extending the integration of social and ecological indicators, and testing across diverse mineral deposits. Such developments will further consolidate the role of artificial intelligence in shaping resilient, responsible, and sustainable mining operations.

Acknowledgements

We thank Rayan Pzhohan Olom Zamin (RPOZ) and its staff for their helpful cooperation in collecting geological, technical, and operational data from the Sarcheshmeh copper mine.

Author contributions

Naser Badakhshan: Conceptualization, Data curation, Methodology, Software, Visualization, Writing—original draft; Ezzeddin Bakhtavar: Conceptualization, Formal Analysis, Methodology, Supervision, Validation, Writing—original draft, Writing—review & editing; Kourosh Shahriar: Supervision, Writing—review & editing; Hassan Khosravi: Methodology, Software, Visualization; Sajjad Afraei: Supervision, Writing—review & editing; Ahlam Maremi: Supervision, Writing—review & editing.

Funding

The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.

Data availability

All data generated or analysed during this study are included in this published article.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Hosseinpour, M., Osanloo, M. & Azimi, Y. Evaluation of positive and negative impacts of mining on sustainable development by a semi-quantitative method. J. Clean. Prod.366, 132955 (2022). [Google Scholar]
2.Bakhtavar, E., Saberi, S., Hu, H., Sadiq, R. & Hewage, K. Fuzzy cognitive-based goal programming for waste rock management with in-pit dumping priority: Towards sustainable mining. Resour. Policy86, 104095 (2023). [Google Scholar]
3.Alshibani, A. et al. Advancing sustainability: An integrated decision support framework for fleet selection in open pit mining construction. Res. Eng.23, 102501 (2024). [Google Scholar]
4.Mwangi, A. D., Jianhua, Z., Gang, H., Kasomo, R. M. & Innocent, M. M. Ultimate pit limit optimization methods in open-pit mines: A review. J. Min. Sci.56, 588–602 (2020). [Google Scholar]
5.Badakhshan, N., Shahriar, K., Afraei, S. & Bakhtavar, E. Determining the environmental costs of mining projects: A comprehensive quantitative assessment. Resour. Policy82, 103561 (2023). [Google Scholar]
6.Marquina-Araujo, J. J., Cotrina-Teatino, M. A., Mamani-Quispe, J. N., Ccatamayo-Barrios, J. H., Ortiz-Quintanilla, S. M., Antonio-Araujo, E., & Portilla-Rodriguez, H. R. Delimitation of the final pit in open pit mines using the pseudoflow maximum flow algorithm: A comparative analysis of 1× 5 and 1× 9 Arcs. Mathematical Modelling of Engineering Problems, 11(7), (2024).
7.Quelopana, A. & Navarra, A. Integration of strategic open-pit mine planning into hierarchical artificial intelligence. J. South Afr. Inst. Min. Metall.121(12), 643–652 (2021). [Google Scholar]
8.Akbari, P., Valencia, S. & Morales, N. Automating and optimising pushback selection using reinforcement learning. Min. Technol.134(4), 274–288 (2025). [Google Scholar]
9.Deutsch, M., Dağdelen, K. & Johnson, T. An open-source program for efficiently computing ultimate pit limits: Mineflow. Nat. Resour. Res.31(3), 1175–1187 (2022). [Google Scholar]
10.Díaz, A. B., Álvarez, I. D., Fernández, C. C., Krzemień, A. & Rodríguez, F. J. I. Calculating ultimate pit limits and determining pushbacks in open-pit mining projects. Resour. Policy72, 102058 (2021). [Google Scholar]
11.Badakhshan, N., Shahriar, K., Afraei, S. & Bakhtavar, E. Evaluating the impacts of the transition from open-pit to underground mining on sustainable development indexes. J. Sustain. Min.22(2), 154 (2023). [Google Scholar]
12.Saleki, M., Kakaie, R. & Ataei, M. Mathematical relationship between ultimate pit limits generated by discounted and undiscounted block value maximization in open-pit mining. J. Sustain. Min.18(2), 94–99 (2019). [Google Scholar]
13.Liu, F. et al. Open pit limit optimization considering the pumped storage benefit after mine closure: A case study. Geomecha. Geophys. Geo-Energy Geo-Resour.10(1), 44 (2024). [Google Scholar]
14.Badakhshan, N., Shahriar, K., Afraei, S. & Bakhtavar, E. Optimization of transition from open-pit to underground mining considering environmental costs. Resour. Policy95, 105178 (2024). [Google Scholar]
15.Marquina Araujo, J. J., Cotrina Teatino, M. A., Noriega Vidal, E. M. & Mamani Quispe, J. N. Delimitation of the final pit of open-pit mines using the maximum flow Pseudoflow method. Int. J. Min. Geo-Eng.58(1), 105–111 (2024). [Google Scholar]
16.Holloway, E. Risk in Ultimate Pit Selection. Min. Metall. Expl.41(2), 589–605 (2024). [Google Scholar]
17.Azadi, N., Mirzaei-Nasirabad, H. & Mousavi, A. Evaluating the efficiency of the genetic algorithm in designing the ultimate pit limit of open-pit mines. Int. J. Min. Geo-Eng.57(1), 55–58 (2023). [Google Scholar]
18.Mwangi, A. D., Jianhua, Z., Gang, H., Kasomo, R. M. & Matidza, I. M. Ultimate pit limit optimization using Boykov-Kolmogorov maximum flow algorithm. J. Min. Environ.12(1), 1–13 (2021). [Google Scholar]
19.Chatterjee, S., Sethi, M. R. & Asad, M. W. A. Production phase and ultimate pit limit design under commodity price uncertainty. Eur. J. Oper. Res.248(2), 658–667 (2016). [Google Scholar]
20.Adibi, N. & Ataee-pour, M. Consideration of sustainable development principles in ultimate pit limit design. Environ. Earth Sci.74, 4699–4718 (2015). [Google Scholar]
21.Rimélé, A., Dimitrakopoulos, R. & Gamache, M. A dynamic stochastic programming approach for open-pit mine planning with geological and commodity price uncertainty. Resour. Policy65, 101570 (2020). [Google Scholar]
22.Wang, S. et al. Dynamic optimization design of open-pit mine full-boundary slope considering uncertainty of rock mass strength. Sci. Rep.14, 19710 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Xu, X. et al. Production scheduling optimization considering ecological costs for open pit metal mines. J. Clean. Prod.180, 210–221 (2018). [Google Scholar]
24.Mirzehi, M. & Moradi Afrapoli, A. A novel framework for integrating environmental costs and carbon pricing in open-pit mine plans: Towards sustainable and green mining. J. Clean. Prod.468, 143059 (2024). [Google Scholar]
25.Huo, D., Sari, Y. A., Kealey, R. & Zhang, Q. Reinforcement learning-based fleet dispatching for greenhouse gas emission reduction in open-pit mining operations. Resour. Conserv. Recycl.188, 106664 (2023). [Google Scholar]
26.Liu, S. Q., Liu, L., Kozan, E., Corry, P., Masoud, M., Chung, S. H., & Li, X. Machine learning for open-pit mining: a systematic review. In International Journal of Mining, Reclamation and Environment, 1–39, (2024).
27.Choi, Y., Nguyen, H., Bui, X. N. & Nguyen-Thoi, T. Optimization of haulage-truck system performance for ore production in open-pit mines using big data and machine learning-based methods. Resour. Policy75, 102522 (2022). [Google Scholar]
28.Badakhshan, N., Bakhtavar, E., Shahriar, K., Afraei, S. & Ben-Awuah, E. Unmanned mining fleet management: A Multi-objective framework integrating deep reinforcement learning and internet of things. Expert Syst. Appl.287, 128238 (2025). [Google Scholar]
29.Noriega, R., Pourrahimian, Y. & Askari-Nasab, H. Deep reinforcement learning based real-time open-pit mining truck dispatching system. Comput. Oper. Res.173, 106815 (2025). [Google Scholar]
30.Hazrathosseini, A. & Moradi Afrapoli, A. Transition to intelligent fleet management systems in open-pit mines: A critical review on application of reinforcement-learning-based systems. Min. Technol.133(1), 50–73 (2024). [Google Scholar]
31.Goldstein, D. M., Aldrich, C. & O’Connor, L. A Review of orebody knowledge enhancement using machine learning on open-pit mine measure-while-drilling data. Machi. Learn. Knowl. Extract.6(2), 1343–1360 (2024). [Google Scholar]
32.Krop, I. et al. Assessment of selected machine learning models for intelligent classification of flyrock hazard in an open-pit mine. IEEE Access12, 8585–8608 (2024). [Google Scholar]
33.Cao, P. et al. Inversion of mine ventilation resistance coefficients enhanced by deep reinforcement learning. Process Saf. Environ. Prot.182, 387–404 (2024). [Google Scholar]
34.Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. Learn.8, 279–292. 10.1007/BF00992698 (1992). [Google Scholar]
35.Alemayehu, E. et al. Optimizing design and stability of open pit slopes in Tolay coal mine. Ethiopia. Scientif. Rep.15, 1570 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Ugurlu, O. F., Fan, C., Jiang, B. & Liu, W. V. Deep neural network models for improving truck productivity prediction in open-pit mines. Min. Metall. Expl.41(2), 619–636 (2024). [Google Scholar]
37.Zhou, Q. et al. An optimized Q-learning algorithm for mobile robot local path planning. Knowl.-Based Syst.286, 111400 (2024). [Google Scholar]
38.Bakhtavar, E., Hosseini, S., Main, H., Hewage, K. & Sadiq, R. Robust prediction of water arsenic levels downstream of gold mines affected by acid mine drainage using hybrid ensemble machine learning and soft computing. J. Hazard. Mater.489, 137665 (2025). [DOI] [PubMed] [Google Scholar]
39.Arab Khaburi, M. & Mortazavi, A. Slope stability analysis of sarcheshmeh copper mine west wall under seismic loads. Geotech. Geol. Eng.37, 3141–3155 (2019). [Google Scholar]
40.Xu, X. et al. Ultimate pit optimization with environmental problem for open-pit coal mine. Process Saf. Environ. Prot.173, 366–372 (2023). [Google Scholar]
41.Cotrina-Teatino, M. A., Marquina-Araujo, J. J., & Riquelme, Á. I. Comparison of machine learning techniques for mineral resource categorization in a copper deposit in Peru. In Natural Resources Research, 1–19, (2025).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data generated or analysed during this study are included in this published article.

[CR1] 1.Hosseinpour, M., Osanloo, M. & Azimi, Y. Evaluation of positive and negative impacts of mining on sustainable development by a semi-quantitative method. J. Clean. Prod.366, 132955 (2022). [Google Scholar]

[CR2] 2.Bakhtavar, E., Saberi, S., Hu, H., Sadiq, R. & Hewage, K. Fuzzy cognitive-based goal programming for waste rock management with in-pit dumping priority: Towards sustainable mining. Resour. Policy86, 104095 (2023). [Google Scholar]

[CR3] 3.Alshibani, A. et al. Advancing sustainability: An integrated decision support framework for fleet selection in open pit mining construction. Res. Eng.23, 102501 (2024). [Google Scholar]

[CR4] 4.Mwangi, A. D., Jianhua, Z., Gang, H., Kasomo, R. M. & Innocent, M. M. Ultimate pit limit optimization methods in open-pit mines: A review. J. Min. Sci.56, 588–602 (2020). [Google Scholar]

[CR5] 5.Badakhshan, N., Shahriar, K., Afraei, S. & Bakhtavar, E. Determining the environmental costs of mining projects: A comprehensive quantitative assessment. Resour. Policy82, 103561 (2023). [Google Scholar]

[CR6] 6.Marquina-Araujo, J. J., Cotrina-Teatino, M. A., Mamani-Quispe, J. N., Ccatamayo-Barrios, J. H., Ortiz-Quintanilla, S. M., Antonio-Araujo, E., & Portilla-Rodriguez, H. R. Delimitation of the final pit in open pit mines using the pseudoflow maximum flow algorithm: A comparative analysis of 1× 5 and 1× 9 Arcs. Mathematical Modelling of Engineering Problems, 11(7), (2024).

[CR7] 7.Quelopana, A. & Navarra, A. Integration of strategic open-pit mine planning into hierarchical artificial intelligence. J. South Afr. Inst. Min. Metall.121(12), 643–652 (2021). [Google Scholar]

[CR8] 8.Akbari, P., Valencia, S. & Morales, N. Automating and optimising pushback selection using reinforcement learning. Min. Technol.134(4), 274–288 (2025). [Google Scholar]

[CR9] 9.Deutsch, M., Dağdelen, K. & Johnson, T. An open-source program for efficiently computing ultimate pit limits: Mineflow. Nat. Resour. Res.31(3), 1175–1187 (2022). [Google Scholar]

[CR10] 10.Díaz, A. B., Álvarez, I. D., Fernández, C. C., Krzemień, A. & Rodríguez, F. J. I. Calculating ultimate pit limits and determining pushbacks in open-pit mining projects. Resour. Policy72, 102058 (2021). [Google Scholar]

[CR11] 11.Badakhshan, N., Shahriar, K., Afraei, S. & Bakhtavar, E. Evaluating the impacts of the transition from open-pit to underground mining on sustainable development indexes. J. Sustain. Min.22(2), 154 (2023). [Google Scholar]

[CR12] 12.Saleki, M., Kakaie, R. & Ataei, M. Mathematical relationship between ultimate pit limits generated by discounted and undiscounted block value maximization in open-pit mining. J. Sustain. Min.18(2), 94–99 (2019). [Google Scholar]

[CR13] 13.Liu, F. et al. Open pit limit optimization considering the pumped storage benefit after mine closure: A case study. Geomecha. Geophys. Geo-Energy Geo-Resour.10(1), 44 (2024). [Google Scholar]

[CR14] 14.Badakhshan, N., Shahriar, K., Afraei, S. & Bakhtavar, E. Optimization of transition from open-pit to underground mining considering environmental costs. Resour. Policy95, 105178 (2024). [Google Scholar]

[CR15] 15.Marquina Araujo, J. J., Cotrina Teatino, M. A., Noriega Vidal, E. M. & Mamani Quispe, J. N. Delimitation of the final pit of open-pit mines using the maximum flow Pseudoflow method. Int. J. Min. Geo-Eng.58(1), 105–111 (2024). [Google Scholar]

[CR16] 16.Holloway, E. Risk in Ultimate Pit Selection. Min. Metall. Expl.41(2), 589–605 (2024). [Google Scholar]

[CR17] 17.Azadi, N., Mirzaei-Nasirabad, H. & Mousavi, A. Evaluating the efficiency of the genetic algorithm in designing the ultimate pit limit of open-pit mines. Int. J. Min. Geo-Eng.57(1), 55–58 (2023). [Google Scholar]

[CR18] 18.Mwangi, A. D., Jianhua, Z., Gang, H., Kasomo, R. M. & Matidza, I. M. Ultimate pit limit optimization using Boykov-Kolmogorov maximum flow algorithm. J. Min. Environ.12(1), 1–13 (2021). [Google Scholar]

[CR19] 19.Chatterjee, S., Sethi, M. R. & Asad, M. W. A. Production phase and ultimate pit limit design under commodity price uncertainty. Eur. J. Oper. Res.248(2), 658–667 (2016). [Google Scholar]

[CR20] 20.Adibi, N. & Ataee-pour, M. Consideration of sustainable development principles in ultimate pit limit design. Environ. Earth Sci.74, 4699–4718 (2015). [Google Scholar]

[CR21] 21.Rimélé, A., Dimitrakopoulos, R. & Gamache, M. A dynamic stochastic programming approach for open-pit mine planning with geological and commodity price uncertainty. Resour. Policy65, 101570 (2020). [Google Scholar]

[CR22] 22.Wang, S. et al. Dynamic optimization design of open-pit mine full-boundary slope considering uncertainty of rock mass strength. Sci. Rep.14, 19710 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Xu, X. et al. Production scheduling optimization considering ecological costs for open pit metal mines. J. Clean. Prod.180, 210–221 (2018). [Google Scholar]

[CR24] 24.Mirzehi, M. & Moradi Afrapoli, A. A novel framework for integrating environmental costs and carbon pricing in open-pit mine plans: Towards sustainable and green mining. J. Clean. Prod.468, 143059 (2024). [Google Scholar]

[CR25] 25.Huo, D., Sari, Y. A., Kealey, R. & Zhang, Q. Reinforcement learning-based fleet dispatching for greenhouse gas emission reduction in open-pit mining operations. Resour. Conserv. Recycl.188, 106664 (2023). [Google Scholar]

[CR26] 26.Liu, S. Q., Liu, L., Kozan, E., Corry, P., Masoud, M., Chung, S. H., & Li, X. Machine learning for open-pit mining: a systematic review. In International Journal of Mining, Reclamation and Environment, 1–39, (2024).

[CR27] 27.Choi, Y., Nguyen, H., Bui, X. N. & Nguyen-Thoi, T. Optimization of haulage-truck system performance for ore production in open-pit mines using big data and machine learning-based methods. Resour. Policy75, 102522 (2022). [Google Scholar]

[CR28] 28.Badakhshan, N., Bakhtavar, E., Shahriar, K., Afraei, S. & Ben-Awuah, E. Unmanned mining fleet management: A Multi-objective framework integrating deep reinforcement learning and internet of things. Expert Syst. Appl.287, 128238 (2025). [Google Scholar]

[CR29] 29.Noriega, R., Pourrahimian, Y. & Askari-Nasab, H. Deep reinforcement learning based real-time open-pit mining truck dispatching system. Comput. Oper. Res.173, 106815 (2025). [Google Scholar]

[CR30] 30.Hazrathosseini, A. & Moradi Afrapoli, A. Transition to intelligent fleet management systems in open-pit mines: A critical review on application of reinforcement-learning-based systems. Min. Technol.133(1), 50–73 (2024). [Google Scholar]

[CR31] 31.Goldstein, D. M., Aldrich, C. & O’Connor, L. A Review of orebody knowledge enhancement using machine learning on open-pit mine measure-while-drilling data. Machi. Learn. Knowl. Extract.6(2), 1343–1360 (2024). [Google Scholar]

[CR32] 32.Krop, I. et al. Assessment of selected machine learning models for intelligent classification of flyrock hazard in an open-pit mine. IEEE Access12, 8585–8608 (2024). [Google Scholar]

[CR33] 33.Cao, P. et al. Inversion of mine ventilation resistance coefficients enhanced by deep reinforcement learning. Process Saf. Environ. Prot.182, 387–404 (2024). [Google Scholar]

[CR34] 34.Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. Learn.8, 279–292. 10.1007/BF00992698 (1992). [Google Scholar]

[CR35] 35.Alemayehu, E. et al. Optimizing design and stability of open pit slopes in Tolay coal mine. Ethiopia. Scientif. Rep.15, 1570 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Ugurlu, O. F., Fan, C., Jiang, B. & Liu, W. V. Deep neural network models for improving truck productivity prediction in open-pit mines. Min. Metall. Expl.41(2), 619–636 (2024). [Google Scholar]

[CR37] 37.Zhou, Q. et al. An optimized Q-learning algorithm for mobile robot local path planning. Knowl.-Based Syst.286, 111400 (2024). [Google Scholar]

[CR38] 38.Bakhtavar, E., Hosseini, S., Main, H., Hewage, K. & Sadiq, R. Robust prediction of water arsenic levels downstream of gold mines affected by acid mine drainage using hybrid ensemble machine learning and soft computing. J. Hazard. Mater.489, 137665 (2025). [DOI] [PubMed] [Google Scholar]

[CR39] 39.Arab Khaburi, M. & Mortazavi, A. Slope stability analysis of sarcheshmeh copper mine west wall under seismic loads. Geotech. Geol. Eng.37, 3141–3155 (2019). [Google Scholar]

[CR40] 40.Xu, X. et al. Ultimate pit optimization with environmental problem for open-pit coal mine. Process Saf. Environ. Prot.173, 366–372 (2023). [Google Scholar]

[CR41] 41.Cotrina-Teatino, M. A., Marquina-Araujo, J. J., & Riquelme, Á. I. Comparison of machine learning techniques for mineral resource categorization in a copper deposit in Peru. In Natural Resources Research, 1–19, (2025).

PERMALINK

A Q-learning approach to waste rock reduction in open-pit mine design based on cleaner production principles

Naser Badakhshan

Ezzeddin Bakhtavar

Kourosh Shahriar

Hassan Khosravi

Sajjad Afraei

Ahlam Maremi

Abstract

Introduction

Table 1.

Methodology

Fig. 1.

Table 2.

Table 3.

Environment

Table 4.

Agent

Action

Fig. 2.

Policy

Fig. 3.

Reward and punishment:

Results and discussion

Table 5.

Hypothetical 2D example

Fig. 4.

Fig. 5.

Hypothetical 3D example

Fig. 6.

Fig. 7.

Validation of hypothetical example results

Table 6.

A large-scale copper mine

Fig. 8.

Fig. 9.

Table 7.

Superiorities and limitations

Conclusions

Acknowledgements

Author contributions

Funding

Data availability

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases