Research on intelligent energy management strategies for connected range-extended electric vehicles based on multi-source information

Xuewen Zhai; Hanwu Liu; Wencai Sun; Zihang Su

doi:10.1038/s41598-025-97955-8

. 2025 Apr 14;15:12758. doi: 10.1038/s41598-025-97955-8

Research on intelligent energy management strategies for connected range-extended electric vehicles based on multi-source information

Xuewen Zhai ¹, Hanwu Liu ^1,^✉, Wencai Sun ¹, Zihang Su ¹

PMCID: PMC11997067 PMID: 40229519

Abstract

Reliance solely on vehicle-specific information, while neglecting multi-source information such as traffic flow and traffic light status, results in difficulties in optimizing energy allocation based on complex road conditions. To achieve the application of multi-source traffic information and enhance the timeliness in multi-objective optimization (MOO) for connected automated range-extended electric vehicles (CAR-EEV), this paper proposes an intelligent energy management strategy (EMS) from a multi-objective perspective. Firstly, a joint simulation platform for traffic scenarios is established based on SUMO and MATLAB, and a data-driven model of CAR-EEV is constructed using collected data, serving as the data foundation and operational platform for subsequent research and development of EMSs. Then, leveraging an image-like representation of traffic flow information based on grid grayscale maps, multi-source traffic information is materialized into a two-dimensional matrix. The Euclidean distance between consecutive traffic scenario matrices is used as a basis for similarity to optimize speed and predict future vehicle speeds. Moreover, a multi-objective intelligent EMS based on deep reinforcement learning (DRL) is employed, utilizing the Deep Deterministic Policy Gradient (DDPG) algorithm to comprehensively consider vehicle dynamics, energy consumption economy, and the degradation of batteries. This establishes an end-to-end intelligent EMS framework for CAR-EEV and accelerates training convergence through prioritized experience replay. Finally, simulations and bench tests demonstrate that this intelligent EMS significantly improves vehicle dynamics and battery life, with notably enhancing real-time performance and effectiveness.

Keywords: Connected automated range-extended electric vehicle, Energy management strategy, Traffic information, Vehicle speed prediction, Deep reinforcement learning

Subject terms: Engineering, Electrical and electronic engineering, Mechanical engineering

Introduction

In the face of the energy crisis and environmental pollution, CAR-EEV have been highly praised for their long endurance, high efficiency, and environmental friendliness¹. However, the complexity of the Auxiliary Power Unit (APU) system and driving conditions has increased the difficulty in developing EMSs. Currently, energy-saving optimizations in EMSs are achieved by comprehensively considering factors such as traffic flow, along with vehicle-specific attributes². Thus, the efficient and intelligent utilization of multi-source traffic information is also pivotal in the research endeavors aimed at energy management for CAR-EEV.

Literature review

EMSs are designed to allocate power output among different energy sources based on control objectives, while satisfying multiple constraints, meeting vehicle power demands, and enhancing vehicle performance³. The primary research questions addressed in this paper are twofold: firstly, the issue of energy management in CAR-EEV, and secondly, the integration and application of traffic information within energy management. The EMSs for hybrid electric vehicles, which govern the interaction between the battery as the primary energy source and the APU, can be broadly categorized into three types: rule-based strategies⁴, optimization-based strategies⁵, and learning-based strategies⁶. Rule-based EMSs are widely employed in controllers of current commercial vehicles, with the most representative examples being logic threshold strategies and fuzzy logic strategies. Although rule-based strategies offer advantages such as simplicity, reliability, and low computational demand, their control rules suffer from limited universality and optimality⁷, which has prompted extensive research into optimization-based strategies. Optimization-based strategies can be further divided into global optimization strategies and real-time optimization strategies. Global optimization methods, which primarily include Dynamic Programming (DP)⁸ and Pontryagin’s Minimum Principle (PMP)⁹, are generally utilized solely as benchmarks for evaluating other algorithms. In contrast, real-time optimization strategies transform global optimization problems into instantaneous or local optimization problems, performing optimization calculations for energy management decisions online to achieve real-time optimization. Currently, well-acknowledged real-time optimization methods encompass the Equivalent Consumption Minimization Strategy (ECMS)¹⁰ and Model Predictive Control (MPC)¹¹.

In recent years, Reinforcement Learning (RL) has emerged as a research hotspot due to its model-free nature and strong adaptability¹². Additionally, it has demonstrated the capability to adapt to various driving conditions¹³. However, in the field of energy management for hybrid electric vehicles, RL algorithms are subject to the risk of the “curse of dimensionality” and discretization errors, which hinder their ability to optimize strategies¹⁴. Existing batteries are plagued by issues such as short lifespan and high manufacturing costs. Current strategies overlook the impact of SOC fluctuations and depth of charge/discharge on battery cycle life, sacrificing long-term durability for short-term efficiency, thereby increasing hidden usage costs¹⁵. The application of DRL in EMSs involves the integration of deep learning and RL to enhance the ability to solve high-dimensional state-action spaces, optimize hybrid vehicle energy management, reduce energy consumption, stabilize battery state of charge (SoC), and shorten computation time¹⁶. However, due to the characteristics of short battery life and high manufacturing costs, an increasing number of studies have focused on the lifespan of energy storage components such as power batteries, incorporating it as one of the objectives in energy management optimization. The aforementioned research has significantly contributed to the advancement of EMSs for hybrid electric vehicles. Especially at the level of environmental perception, the mainstream strategy relies excessively on single-vehicle sensors, leading to the phenomenon of “information silos”. In urban traffic conditions, for every 100 vehicles per hour increase in traffic flow density, electricity consumption rises by 15–20%. Existing strategies are unable to predict the relationship between queue lengths at intersections and signal phase timings, resulting in ineffective acceleration before red lights or frequent starts and stops. Electric vehicles not connected to V2X consume 28% more energy on road segments with dense traffic signals, highlighting the need for multi-source information fusion¹⁷. With the increasing ability to obtain traffic information in a connected environment, leveraging traffic information to enhance the performance of EMSs has emerged as an important direction for development. The application and development of EMSs have significantly improved hybrid electric vehicle energy management. However, driving condition information also directly influences the output results of EMSs, and the performance of these strategies largely depends on the driving conditions. Currently, there are generally two methods for obtaining vehicle driving condition information¹⁸. One involves applying various prediction algorithms to forecast driving conditions over a future time period through the analysis of historical driving data. The other method incorporates traffic information into the prediction of vehicle driving conditions, utilizing prediction algorithms to obtain future driving condition information.

Prediction of driving conditions based on historical driving data involves extracting features from standard driving cycles and recent driving data of vehicles, and then applying computational methods such as Markov models and artificial neural networks to predict future driving conditions. Based on Markov theory, the Markov method predicts state changes at future times based on the current state¹⁹. Since vehicle motion is a dynamic driving process based on transitions in acceleration and speed states, it can be readily modeled as a Markov process. This modeling approach offers a simple structure and accounts for the uncertainties in driving. Prediction of driving conditions based on traffic information is conducted with consideration of the interconnected nature among humans, vehicles, and the environment²⁰. Drivers and vehicles are closely related to the traffic environment, which directly impacts vehicle movement. Consequently, numerous researchers have conducted studies on the relationship between the traffic environment and driving conditions. The incorporation of traffic information has not only improved the accuracy of vehicle driving condition predictions but also enhanced the optimization performance of EMSs. However, most current research on EMSs based on traffic information focuses on single scenarios. Historical data-driven methods suffer from spatiotemporal adaptability issues, with an energy allocation error rate as high as 42% when lanes are temporarily closed. Most strategies rely solely on vehicle-specific information, neglecting diverse traffic sources, making it difficult to optimize energy allocation in real-time. Furthermore, methods based on historical data lack consideration for real-time adaptability and diversity, resulting in poor self-adaptability. Therefore, developing EMSs with higher generalization and multi-scenario adaptability is an urgent issue that needs to be addressed²¹.

CAR—EEVs with V2X need to flexibly allocate energy between the battery and range extender for optimal efficiency and environmental performance. However, current research mainly focuses on vehicle—specific info, ignoring external factors like traffic flow and signal status. This “single—dimensional perception” in energy management has flaws²². Traditional strategies use only on—board data such as speed and SOC, lacking V2X two—way data flow with traffic infrastructure. As a result, at signal—dense intersections, vehicles without intelligent system connection waste 32% of energy on ineffective acceleration and braking due to no green—light—time prediction. Also, dynamic traffic’s multimodal features are not well—analyzed, leading to homogeneous energy allocation. Range extenders with a unified strategy consume 18% more fuel in congestion and have 11% power—generation redundancy on clear roads²³. Besides, current control algorithms poorly understand traffic—flow spatiotemporal evolution. Future research should build a multi—source data—integrated decision—making framework and a traffic—energy—flow coupled RL model. This can achieve millisecond—level power reconfiguration. For instance, predicting 200—meter—ahead road topology changes can adjust the range extender 0.8 s in advance, boosting efficiency by over 15%. Vehicle—road coordinated energy management is key to overcoming current bottlenecks²⁴.

Motivation and innovation

In the context of connected vehicle environments, we created a data-driven CAR-EEV model and integrated it with SUMO-Matlab traffic simulation and proposes a vehicle speed prediction method based on historical traffic scenario retrospection, addressing issues such as the utilization of multi-source traffic information and the timeliness of MOO. Additionally, it designs a multi-objective intelligent EMS that integrates energy consumption and battery life degradation. The innovations of this paper are as follows: (1) Our vehicle speed prediction method uses grid grayscale images and Euclidean distance comparisons, validated through scenario analysis. (2) The multi-objective intelligent EMS for CAR-EEV uses a DDPG algorithm to optimize energy and battery life, enhanced by prioritized experience replay. (3) In-the-loop tests and strategy bench tests based on the Speedgoat platform confirmed the EMS’s real-time performance and effectiveness.

Modeling of powertrain systems and vehicle speed prediction for CAR-EEV

The precise modeling of powertrain systems and their components in CAR-EEV, which are electromechanical hybrid systems powered jointly by multiple energy sources, is the foundation for studying EMSs. In this chapter, leveraging AVL, detailed modeling of the powertrain systems and their components in CAR-EEV will be conducted. The modeling principles of the vehicle’s powertrain system and its key components will be introduced, encompassing the establishment of models such as dynamic models, APU models, battery models, DC/DC converter models, and drive motor models. Additionally, a brief analysis of the degradation mechanisms of batteries and traction batteries will be provided, with corresponding lifetime degradation models being established. Furthermore, the establishment of traffic simulation scenario models serves as the foundation for subsequent research on vehicle speed prediction utilizing traffic information.

Modeling of powertrain simulation environments and modeling transportation systems

The topological structure of the powertrain system for CAR-EEV is illustrated in Fig. 1.

In this paper, traffic scenario models are established for traffic simulation based on SUMO, and a co-simulation platform is constructed with Matlab through the Traci interface. Within this model, each vehicle is equipped with car-following and lane-changing models to facilitate the selection of appropriate routes to reach their destinations²⁵. Based on real-vehicle data, a traffic network model has been established in SUMO to comprehensively simulate the traffic conditions across all road segments. The dynamic behavior of vehicles on the roads is reflected by their speed, acceleration, and driving states, with vehicle state data being collected in real-time through onboard sensors and GPS devices. Driving style data is derived by analyzing indicators such as changes in vehicle acceleration, following distances, and overtaking frequencies, which reveal the unique driving habits and behavioral characteristics of drivers. Environmental feature data integrates traffic signal timing and road conditions. To simplify the model and reduce computational load, two assumptions are made: (1) the timing of traffic signals is fixed and does not vary with traffic flow; (2) the dwell time of vehicles at stations is uniformly set to 10 s.

The car-following model utilized in this paper is the Krauss model, which is a type of safe distance model. The Krauss model, as a classic car-following model, is characterized by its good accuracy and applicability, and is capable of comprehensively reflecting the following characteristics of vehicles during actual driving, which aligns with the driving styles observed in real-driving data. Compared to lane-changing models, the Krauss model is more suitable for studying the following behavior of vehicles during straight-line driving and is therefore adopted. By comparing simulation results with real-driving data, the Krauss model demonstrates high consistency in terms of vehicle following behavior, verifying its applicability to the research. Additionally, it can be adjusted and optimized according to specific circumstances in practical applications. In this model, a safe vehicle distance is maintained between the preceding and following vehicles. When the motion state of the preceding vehicle changes, the following vehicle adjusts its behavior accordingly based on vehicle dynamics to avoid collisions and ensure the normal operation of traffic flow. The fundamental principles of the model are outlined as follows:

The maximum safe following speed is calculated, as shown in Eq. (1):

where Inline graphic is the maximum safe following speed; is the maximum deceleration; is the driver’s reaction time; is the speed of the preceding vehicle; is the distance between vehicles.

By comparing the safe speed, maximum speed, road speed limit, and the speed at the next moment, the available speed for the following vehicle is obtained, as shown in Eq. (2):

where Inline graphic is the maximum speed; is the road speed limit; is the speed at the next moment.

Due to factors such as each vehicle’s individual performance characteristics and variations in driving styles among operators, deviations are incurred in the vehicle speeds calculated above. Consequently, a random factor is introduced to compute the final vehicle speed Inline graphic , as shown in Eq. (3):

where Inline graphic is the final speed; is the random factor.

In SUMO, the Euler numerical integration method is used by default to update vehicle positions, which is selected to update the vehicle’s position information.

Vehicle speed prediction based on multi-source traffic information

Vehicle speed prediction, individual vehicles exhibit complex temporal correlations without direct spatial dependencies, which renders speed prediction challenging. In urban systems, however, vehicle travel patterns demonstrate certain regularities in both temporal and spatial dimensions. Based on this observation, this paper proposes a vehicle speed prediction method that leverages historical traffic scenario information for backtracking and utilizes vehicles’ historical operating conditions to predict speeds. The traffic information of all vehicles on a fixed-length road segment is processed into data with dimensions that do not vary with individual vehicles, through the method of collecting such information. However, there exist challenges such as the complexity of traffic scenarios, where multidimensional data is difficult to utilize directly, and the inaccuracy of spatial comparisons across different traffic scenarios using image recognition. To address the aforementioned issues, this paper proposes an image-like representation of traffic flow based on grid grayscale images. This representation utilizes the individual speeds of vehicles to provide a secondary expression of traffic scenarios in the form of speed grayscale images, building upon the spatial distribution of vehicles. Each grid in the grayscale image can convey four types of information: the lane occupied by the vehicle, its distance from a reference point, the vehicle type, and its traveling speed. This approach resolves the challenge of utilizing multidimensional data that was previously difficult to leverage. The traffic flow state of the studied road segment can be represented by Eq. (4) ref.²⁶. The grid grayscale map can be viewed as a spatial vector matrix, which addresses the issue of inaccurate spatial comparisons across different traffic scenarios using image recognition by enabling the calculation of matrix similarity.

where Inline graphic is the overall traffic flow state of the studied road segment; is the number of lanes (for the traffic road studied in this instance, it is a three-lane road, with = 1, 2, 3); is the state of vehicle within the grid grayscale map;

Here are the parameters that need to be defined: Inline graphic is the starting position of the vehicle within the grid; is the ending position of the vehicle within the grid; is the speed of the vehicle.

The diagram depicted in Fig. 2 illustrates the traffic flow representation method based on image-like grid grayscale maps. The specific steps involved in creating such grid grayscale maps for traffic flow are as follows: Road segments of length Inline graphic are taken as the research subject to obtain traffic flow information. These segments are discretized vertically by lane and horizontally by units of 1-m length to establish a grid map. Based on the position and size of vehicles in actual traffic flow, the corresponding positions of each vehicle are depicted within the grid map. Grids that are not fully occupied by vehicles are treated as complete, as illustrated by the three-lane position information shown in the middle of Fig. 2. The grids corresponding to the positions of vehicles are filled based on the actual speeds of the vehicles in the traffic flow. When there are no vehicles in a grid, it remains uncolored to indicate the absence of vehicles. After extracting vehicle position, size, and speed information, the traffic flow information is represented as shown in the bottom three-lane speed grayscale grid information in Fig. 2.

After traffic flow information is processed into grid grayscale maps, a database is established for the purpose of facilitating subsequent speed predictions, which comprises the following specifics: A collection of historical traffic image grid grayscale maps, denoted as Inline graphic , is established upon vehicles’ initial entry into the study road segment. Corresponding to each historical traffic state () within , there are historical speeds () of the study vehicles. Corresponding to each historical traffic state () within , there are travel times (t_i) of the study vehicles along the road segment. Among them, the historical traffic image set Inline graphic serves as the foundation for assessing the degree of similarity between historical scenarios and the current scenario. The determination of similarity between historical traffic states () and the current traffic state () directly influences the effectiveness of subsequent speed predictions. The final output result of the speed prediction is denoted as Inline graphic . Meanwhile, , as the prediction time horizon, is incorporated into the subsequent speed prediction process to enhance the prediction effectiveness. Compared to methods that directly set the prediction time horizon to a fixed value, this variable prediction time horizon approach more fully utilizes the temporal variation characteristics of vehicle speeds, thereby further improving the accuracy of speed predictions.

where Inline graphic is the Euclidean distance between and ; is the current traffic state; is the historical traffic state in the database U; is the historical speed associated with the selected ; is the future speed under the current traffic state .

Intelligent EMS based on DDPG with MOO

To comprehensively consider the factors of energy consumption economy and battery life degradation in CAR-EEV, a MOO-based EMS leveraging DDPG has been proposed, with the introduction of DRL methods. This strategy integrates the strong non-linear fitting capability and intelligent decision-making ability of DRL. Additionally, prioritized experience replay has been employed to enhance learning performance and accelerate convergence.

An review of the principles of DRL algorithms

Q-Learning (QL) is the most commonly utilized algorithm within the realm of RL. DRL algorithms can be further categorized into Deep Q-Network (DQN), Double Deep Q-Network (DDQN), DDPG, Twin Delayed DDPG (TD3), and Soft Actor-Critic (SAC). DRL algorithms based on DQN and Actor-Critic (AC) (namely, DDPG, TD3, and SAC) have been widely applied in the field of EMSs. The DDPG algorithm, which adopts the AC framework, operates on the principle of utilizing an actor to output continuous action variables, which are then evaluated by a critic network to achieve continuous action spaces and state acquisition. To enhance stability and convergence, DDPG integrates DQN and Deterministic Policy Gradient methods, wherein the critic network is updated using the Temporal Difference (TD) method of DQN, and the actor network is updated using the Deterministic Policy Gradient approach. The output action is set to Inline graphic , enabling continuous action output and interaction with the vehicle. The critic network consists of a main network (with parameters denoted as ) and a target network (with parameters denoted as ), while the actor network comprises a main network (with parameters denoted as ) and a target network (with parameters denoted as Inline graphic ). The parameters and are updated according to exponential smoothing, as shown in Eq. (8) ref.²⁷:

where Inline graphic is the target smoothing factor.

The Q-value function of the DDPG algorithm is learned through the Bellman equation, as shown in Eq. (9) ref.²⁷, and the TD error is calculated using Eq. (10) ref.²⁷. Ultimately, gradient descent is employed to minimize the loss function, as illustrated in Eq. (11) ref.²⁷.

By leveraging the powerful approximation capabilities of deep neural networks for nonlinear functions, DRL avoids the “curse of dimensionality” in addressing high-dimensional, continuous, and nonlinear tasks. This allows EMSs to incorporate multi-source traffic and vehicle state information, providing a foundation for the agent to recognize environmental changes and learn optimal control behaviors. With the integration of neural networks and high-dimensional state spaces, through the application of nonlinear activation functions, agents can flexibly fit complex structural information, handle diverse environments, comprehensively learn high-level abstract features, and enhance generalization and reasoning capabilities. Therefore, this paper selects the DDPG algorithm, which overcomes the limitations of discrete action spaces in DQL, to investigate multi-objective EMSs for CAR-EEV.

As illustrated in Fig. 3, the framework of DDPG and its application in the field of EMSs for hybrid electric vehicles are presented. The agent in the DDPG algorithm comprises an actor network and a critic network. At each time step, the actor network is guided to select actions based on the feedback of TD errors from the critic network, augmented with random noise (such as Ornstein–Uhlenbeck noise or Gaussian noise) to enhance exploration capabilities.

Priority experience replay

In the traditional experience replay mechanism, samples randomly selected from the experience pool Dt lose valuable information between samples, thereby affecting the algorithm’s performance. Therefore, this paper adopts the priority experience replay mechanism, using the TD-error value as an evaluation of sample importance. To avoid overfitting of the neural network, the probability of random sampling is given by Eq. (12).

where Inline graphic is the priority samples; is the number of priorities already used; is the rank of sample when the replay memory is arranged based on the absolute value of the td-error signal.

The application of the replay mechanism, which is prioritized for experience, alters the state distribution, resulting in biases. Consequently, weights based on importance are employed to mitigate this effect, as shown in Eq. (13).

where Inline graphic is the batch sample size; is the compensation coefficient.

The loss function is given by Eq. (14):

Framework for MOO EMS based on DDPG

A MOO framework based on DDPG is designed to simultaneously maximize (or minimize) multiple conflicting objectives, establishing an adaptive learning paradigm for energy management in CAR-EEVs within a high-dimensional decision space. This framework constructs a state space representation through a deep neural network, which not only encompasses real-time kinematic parameters of the vehicle, but also deeply integrates multi-source environmental features, including dynamic traffic flow density and real-time signal phase information obtained via V2X communication, which are critical decision-making elements. In the design of the action space, the limitations of traditional discrete control modes are overcome by establishing a continuous power allocation domain, where key control variables such as the power generation of the range extender, battery charging and discharging currents, and energy recovery intensity are mapped into a continuous action space, enabling the policy output to possess fine-grained capabilities. The DDPG algorithm achieves policy iteration and evolution through an Actor-Critic dual-network architecture, where the Actor network outputs deterministic action policies, and the Critic network evaluates the long-term rewards of state-action pairs. MOO is implemented by constructing a reward function that is a weighted sum of multiple single-objective reward functions, each corresponding to an optimization objective. The weights are determined based on the problem requirements and the engineer’s experience to ensure a balanced strategy. As illustrated in the Fig. 4, the framework for the multi-objective EMS based on DDPG is presented, where the upper layer utilizes velocity prediction methods to obtain the vehicle’s future driving conditions, which serve as inputs to the state space of the EMS; the lower layer, in turn, employs the DDPG-based EMS proposed in this chapter to achieve optimal power distribution for CAR-EEV.

Fig. 4 — Multi-objective Energy Management Framework Based on DDPG.

The state space S for the EMS based on DDPG is set as:

where Inline graphic is the state of charge; is the vehicle speed; is the vehicle acceleration.

The action, which is set as the continuously adjustable output power of the APU system, serves as an easily controllable variable in the EMS for battery hybrid electric vehicles, as shown in Eq. (16).

The setting of the reward function significantly influences the outcome of the algorithm as feedback. In addition to energy consumption, the durability of the battery must also be considered for CAR-EEV. To mitigate power variations, suppress excessive SoC levels, and extend the capacity of the battery, MOO is conducted by incorporating factors such as battery power fluctuations and overcharging/over-discharging into the reward function²⁸. To minimize energy consumption, prevent lithium-ion batteries from overcharging and over-discharging, and enhance system longevity, the MOO reward function Inline graphic is defined as:

where Inline graphic is the consumption; is the equivalent fuel consumption of the battery; is the battery efficiency; is the weighting factor for battery consumption; is the weighting factor for battery SoC fluctuation; is the weighting factor for APU output power fluctuation; SoC₀ is the initial battery SoC.

The initial value of the battery is set to 0.7, while in the configuration of the reward function, the battery SoC is maintained within the range of 0.6 to 0.8 to ensure efficient operation and prolong the lifespan of lithium-ion batteries. The hyperparameter configuration for the DDPG algorithm is detailed as follows. The target weights were iteratively tuned and optimized based on the training performance. In the AC network architecture, both the actor and critic networks incorporate three hidden layers, featuring 300, 250, and 150 neurons respectively, utilizing linear activation functions. Specifically, the learning rate for the actor network is set at 0.0001, whereas the critic network’s learning rate is 0.001. A discount factor of 0.99 is employed. The training process spans 200 epochs, with each batch containing 268 samples. Furthermore, the experience replay buffer is configured with a capacity of 10,000 transitions.

Verification and results analysis

To validate the performance of the intelligent EMS proposed in this paper in terms of energy consumption economy and mitigating battery capacity degradation in CAR-EEV, the traffic simulation scenarios based on SUMO established previously were utilized to obtain vehicle driving condition information, which was then applied to the DDPG intelligent EMS within a continuous action space.

Simulation results and analysis of vehicle speed prediction

The training process is initiated by utilizing a simulation model to emulate the operational conditions of a 100m cyclic experiment, where the agent performs actions and collects data within the simulated environment. This collected data is subsequently used for the training of neural networks, particularly the Critic network in DRL, which is responsible for estimating the expected reward that the agent may obtain in the future, known as the action value. During the training process, the agent’s performance is evaluated by monitoring the relationship between the reward value obtained from each simulation (Episodes Reward) and the number of training episodes (Episodes Number), as well as the Critic network’s estimation of the action value (Episode Q0). The training terminates upon reaching a maximum of 500 episodes or when the agent achieves the target reward value of -30 for 20 consecutive episodes, at which point the policy and related parameters are saved. The indication of policy convergence is when Episode Q0 closely coincides with Episodes Reward. Upon completion of the training, the agent’s performance must undergo simulation validation in the Simulink environment to ensure its satisfactory performance in practical applications. This method visually demonstrates the agent’s behavior and effectively assesses its performance. A discussion and comparison of the proposed speed prediction method are conducted under various conditions of road information collection, with the aim of obtaining the relevant parameters and application scenarios that yield the optimal prediction performance. In order to evaluate the prediction accuracy of the proposed speed prediction method, the Root Mean Square Error (RMSE) is employed as an evaluation metric to assess the accuracy and effectiveness of the speed prediction model. Agent training progress is shown in Fig. 5.

Traffic scene information is represented a second time based on grid grayscale images, which are abstracted and expressed in the form of matrices, thereby directly converting the information into usable data. Furthermore, Euclidean distances between traffic scene matrices are calculated to determine similarity, and the speed of the most similar historical scene is used to predict future vehicle speeds, thereby achieving vehicle speed prediction. Subsequently, all speed prediction studies in this paper are conducted based on road segments in various scenarios. An example of the speed prediction results is presented in Fig. 6, which showcases the outcomes of vehicles traveling in the traffic network for 3400 s (hereinafter referred to as the simulation duration) in a traffic simulation scenario built using SUMO. Specifically, the first 80%, or 0-2720 s of the simulation duration, served as the historical data set, comprising a total of 22 historical scenarios. The remaining 20%, or 2720-3400 s of the simulation duration, was used as the test set, where the speed prediction performance was evaluated across 4 traffic scenarios.

Fig. 6 — Presentation of speed prediction results.

A detailed analysis and discussion of scenarios are conducted to verify the effectiveness of this speed prediction method. It can be observed from Fig. 6 that, among the four tested scenarios, scenarios 3 and 4 exhibit superior speed prediction performance, characterized by a RMSE below 2 m²/s², indicating a relatively low RMSE level. As the duration of the simulation increases and the historical traffic scenario data stored in the database accumulates, the prediction accuracy, defined as the proportion of test scenarios with superior speed prediction performance to the total number of test scenarios. In the early stages of traffic simulation, due to the limited amount of historical traffic information data collected in the database, the prediction results are notably characterized by randomness. As the total simulation duration increases and data accumulates, the prediction accuracy tends to stabilize. The results indicate that the prediction performs well on road segments with relatively regular speed variations.

Analysis of MOO control

To validate the performance of various algorithms in terms of energy consumption and mitigating battery capacity fade for CAR-EEV, a comparison is conducted with EMS based on DP, Charge Depleting-Charge Sustaining (CD-CS) rules and Charge Depleting-Blend (CD-Blend) rules. Through the performance comparison among these four EMSs, the effectiveness of the proposed multi-objective intelligent EMS (DDPG) in optimizing energy consumption and mitigating battery capacity fade degradation is verified. The algorithm parameters are strictly designed according to urban travel features. The 3400—second simulation duration equals the average urban commuting time, covering off—peak to peak, congestion and evacuation periods. The 0—75 km/h speed range matches the measured private—car speed distribution in the city (μ = 32km/h, σ = 18km/h) as statistically verified. The power system’s 50—kW rated power, based on mainstream range—extended vehicle models, ensures real—vehicle—like power response. The initial SOC of 0.36 is set according to urban users’ daily trip electricity—consumption research, representing the typical commuting battery start state.The CD—CS strategy activates the APU when SOC reaches 30%, mimicking traditional threshold control. The other strategy uses a power—mixing mode with a dynamic proportional—integral controller for electric—oil power coupling. This helps analyze how strategies affect key metrics. For example, the CD—CS rule reduces battery charge—discharge cycles by 42%, and CD—Blend cuts APU start—stop frequency by 38%. Regarding algorithm comparison, DDPG and DP share the Q—value function structure but differ in state—space discretization. DP uses a 40 × 80 grid, while DDPG uses a four—layer fully—connected network for continuous state mapping. Both use a 0.95 discount factor. The experiment, verified by 100 Monte Carlo repetitions with a 95% confidence interval, sets control variables carefully. This provides a strict benchmark for comparing deep reinforcement learning and traditional optimization algorithms, laying a quantitative groundwork for applying these strategies to real—world traffic scenarios.

The internal combustion engine (ICE) serves as a key energy source within a CAR-EEV, fully accounting for its fuel consumption²⁹. This section outlines the criteria for assessing fuel consumption and battery capacity deterioration³⁰. A stable SoC trajectory is advantageous for prolonging battery lifespan, enhancing the efficiency of energy exchange between the battery and engine. The hybrid energy storage system harmonizes the functionalities of the APU and batteries, presenting a potent strategy to extend battery service life³¹. In the context of this study, considering the electric motor’s efficiency, Eq. (20) ref.²¹ specifies the oil-to-electric conversion loss rate (C_{fuel_ele}) and battery life degradation (Q_loss).

where η_ele represents the power generation efficiency of the generator η_fuel denotes the efficiency between the fuel consumption and the effective power output of the engine, and ρ stands for the gasoline calorific value, which is 4.6 × 107J / kg. T_cyc is the total cycle time, I_bat(t) is the battery current; DOD is the depth of charge or discharge, DOD = 0.7; N_cyc is life cycles, Ah_cell is the cumulative capacity of the battery. I_{com_ovp} is the comprehensive performance evaluation index of APU taking into account energy consumption and battery life. This article uses the analytic hierarchy process to obtain a multi-objective evaluation matrix, and calculates the weight vector result as [ω_C, ω_Q]^T = [0.46, 0.54]^T.

Alongside analyzing the outcomes of the energy storage system, the simulation also evaluated system fuel consumption. Furthermore, it compared the 100 km equivalent fuel consumption (Ge) among four control strategies, each adjusted for terminal SoC (representing SoC-corrected fuel efficiencies). And battery degradation are shown in Table 1, with results being contrasted against those obtained from an EMS based on DP. Due to its requirement for prior information and high computational cost, DP is not suitable for real-time energy management and is therefore only used as a benchmark method³².

Table 1.

Test results of experimental test.

Strategy	C_{fuel_ele}	∆ C_{fuel_ele} (%)	Ge (L/100km)	∆Ge (%)	Q_loss (%)	∆Q_loss (%)	I_{com_ovp}
CD-CS	0.76	−	4.7	−	11.6	−	0.76
CD-Blend	0.75	− 1.32	4.5	− 4.26	10.2	− 13.79	0.88
DP	0.71	− 6.58	4.2	− 10.64	9.7	− 16.38	0.93
DDPG	0.72	− 5.26	4.3	− 8.51	10.4	− 10.34	0.91

Open in a new tab

The energy consumption of C_{fuel_ele} for the EMS based on CD-CS rules and the DDPG-based EMS is 0.76 and 0.72, respectively. Compared to the CD-CS strategy, the DDPG strategy achieves a -5.26% reduction in equivalent energy consumption, a 6.58% decrease in deviation from the DP strategy, and an energy consumption economy close to 80% of DP. From the perspective of Ge, there is a similar conclusion that DDPG can achieve better energy savings. In terms of timeliness, the CD-CS strategy has the shortest calculation time (0.15s) and thus the best real-time performance; the DDPG strategy has a calculation time of 0.18s, which is slightly longer than CD-CS but still possesses the potential for online real-time application. In terms of battery life, compared to the CD-CS strategy, CD-Blend results in a 13.79% reduction in battery lifetime degradation, while the DP strategy leads to a 16.38% decrease. The DP strategy performs best in this regard, as it is designed to operate the battery within high-efficiency ranges, thereby mitigating the degradation of battery lifetime. However, the DDPG strategy exhibits an 10.34% lower degradation compared to the CD-CS strategy, effectively controlling the degradation of the battery. Consequently, the CD-CS strategy is approximate optimal in controlling battery lifetime degradation, while the DDPG strategy demonstrates good performance in managing battery degradation.

The control strategy that has been developed shows an efficient outcome in terms of energy consumption, and battery life. DP serves as an optimal offline base line. Due to the differing units and scales, it is necessary to calculate normalization parameters for the performance indicators before comparing the strategies horizontally. The parameters are calculated as follows:

where the η_ik value represents the normalized index of the j-th evaluation index on the i-strategy; The index value is λ_ij; λ^min_ij and λ^max_ij are the minimum and maximum values of the index, respectively.

The range of data used for the normalization analysis includes the results of the simulation under the aforementioned tactics alongside the result data. The statistical results obtained from normalization are shown in the Fig. 7.

Fig. 7 — Degradation of power battery life.

The proposed EMS based on DDPG approximates the globally optimal EMS based on DP in terms of energy consumption economy. Additionally, it exhibits a more balanced performance in mitigating the degradation of battery capacity compared to both the DP-based and CD-CS rule-based EMSs. This strategy optimizes the lifetime degradation of both the battery and battery while ensuring economic efficiency. By simulating different operating conditions, the advantages of the DDPG in energy cost savings, battery life control, and control reliability under various driving conditions were verified. The results are shown in Fig. 8. Serving as the primary power source for battery hybrid electric vehicles, the APU system fulfills the basic power requirements of the vehicle. In the form of the battery, APU possesses response capabilities, which mitigate load power fluctuations and extend the lifespan of the APU. In this paper, the EMS based on DDPG takes into account the lifespan of the battery by incorporating a battery SoC maintenance factor into the reward function.

Fig. 8 — Comparison of output characteristics of different algorithms.

The vehicle’s alterations under the DDPG are clearly identified by numbers: CD-EV mode (number 1): In this mode, the vehicle is driven primarily by electric power, while the CD strategy is utilized to manage the charging of the battery when power is sufficient. CD-Blend mode (number 2): In this mode, the APU system kicks in to utilize the two energy sources in a more balanced way. As the power dwindles, the APU system steps in more to provide power while still maintaining a percentage of the power. CS-Blend mode (number 3): In this mode, the CS strategy is employed to keep the battery charge at a relatively constant level. This means that the engine will be more involved in the drive while charging the battery or keeping its charge constant. DC mode (number 4): The vehicle’s driving dynamics are applied to charge the battery in DC mode. This usually occurs when congested road conditions are predicted, and the battery is replenished in advance by charging on the move. As depicted in Fig. 8, the battery SoC fluctuates around 0.3–0.4, the CD-CS strategy exhibits the largest SoC fluctuations and the most drastic changes, leading to the most significant degradation in battery lifespan. Figure 8 presents the distribution of battery current and power under three EMSs, the DDPG strategy exhibits the highest number of high-efficiency operating points, which aligns with the rules set forth by the strategy. In contrast, DDPG, with the aim of minimizing fuel consumption, distribute more operating points within the range of lower efficiencies where the output power is also correspondingly lower. The simulation results indicate that the DDPG achieves the best energy consumption economy, and exhibits the lowest battery lifespan degradation among the three strategies. In contrast, the CD-CS has the highest total equivalent energy consumption, yet it demonstrates the highest battery lifespan degradation, suggesting that operating the battery within the lowest efficiency range. The proposed multi-objective intelligent EMS based on DDPG achieves a balance between energy consumption economy and battery capacity fade by approximating globally optimal energy consumption economy while inhibiting the lifespan degradation of the battery and enhancing the lifespan.

Experimental test implementation and its results

The formulation and offline simulation of a multi-objective intelligent EMS based on DDPG were completed. However, in real-world application environments, adaptive adjustments to the control model from the offline simulation are still necessary to ensure proper functionality in real-time settings. The experimental process is summarized as follows: The vehicle parameters and preset driving cycle conditions are input into the road load simulation software. Next, the AVL system simulates actual road loads by closed-loop control of the speed and torque of the motor and electric dynamometer. Meanwhile, considering the DC bus voltage and current, calculate the real-time electric power. Subsequently, the control system collects the battery SoC and executes EMS based on it, outputting the target speed and torque of the engine and generator. Finally, the APU coordinated control strategy determined the power output of the APU system. To verify the effectiveness of power trajectory tracking, the calculated APU power was set as the target power, and experiments were conducted on the four strategies. The experimental platform structure is shown in Fig. 9.

Fig. 9 — Structure of the experimental platform.

The energy consumption economy results from both offline simulations and bench testing are presented in Table 2. As illustrated in Table 2, the results from offline simulations and bench tests are generally consistent, demonstrating good power tracking and meeting dynamic performance requirements. Compared to offline simulations, a slight increase in energy consumption is observed under bench testing, yet the overall deviation is minimal, and the results are largely consistent. The DDPG-based intelligent EMS proposed in this paper achieves real-time vehicle speed tracking with minimal errors, validating the real-time performance and operational accuracy of the strategy. The results from offline simulations are largely consistent with those from bench testing, although minor deviations arise due to factors such as signal accuracy and online transmission speeds. However, these errors remain within an acceptable range, indicating that the EMS is able to achieve its intended purpose.

Table 2.

Test results of experimental test.

Strategy	SoC₀ = 0.4				SoC₀ = 0.7
Strategy	C_{fuel_ele}	Ge (L/100km)	Q_loss (%)	Strategy	C_{fuel_ele}	Ge (L/100km)	Q_loss (%)
CD-CS	0.77	4.9	12.4	CD-CS	0.76	4.8	11.8
CD-Blend	0.76	4.6	11.2	CD-Blend	0.76	4.6	10.9
DDPG	0.73	4.3	10.5	DDPG	0.72	4.2	10.5

Open in a new tab

The following conclusion can be drawn from the analysis of Table 2: when SoC₀ = 0.3, the performance of CD-CS and CD-Blend in SoC control is poor; DDPG were able to successfully achieve the SoC control target. From the perspective of fuel consumption cost, C_{fuel_ele} is 0.73, which saves 5.2% compared to CD-CS and 3.9% compared to CD-Blend, and Ge is 4.3 L/100km, which saves 12.2% compared to CD-CS and 6.5% compared to CD-Blend. In terms of battery lifespan degradation, under the DDPG strategy, the battery output power fluctuations are minimal, leading to less lifespan degradation. Compared to the CD-CS, the DDPG strategy reduces battery lifespan degradation by 15.3%; compared to the CD-Blend, the DDPG strategy reduces battery lifespan degradation by 6.25%, effectively inhibiting battery degradation. Therefore, the DDPG strategy performs well in extending the lifespan of the battery. When SoC₀ = 0.4, there are still similar optimization results in terms of energy consumption, and it can also be seen that with the increase of SoC, the operating efficiency of the battery increases, and the overall energy consumption and battery life loss are reduced. The results from bench testing have demonstrated the feasibility and effectiveness of the proposed EMS, achieving satisfactory control performance.

Conclusion

This paper’s research is centered on CAR—EEV, with the objective of enhancing the performance of EMSs within a connected environment. A multi—objective intelligent EMS for CAR—EEV is proposed, which is based on the retrospection of historical traffic scenario information. Based on measured data, a system simulation model along with a corresponding lifetime degradation model are established. Additionally, a traffic simulation scenario platform is developed using SUMO. A vehicle speed prediction method, grounded in historical traffic scenario retrospection, is put forward. The convergence rate of the speed prediction accuracy variation curves across various road segments indicates the regularity of vehicle travel in both temporal and spatial dimensions. Subsequently, a MOO EMS based on DDPG is proposed, comprehensively taking into account energy consumption optimization and battery life degradation. When compared with an EMS based on CD—CS rules and a globally optimal EMS based on DP, the DDPG—based EMS attains an energy consumption closer to the globally optimal result. It reaches 80% of the performance of the DP—based strategy, which is significantly higher than that of the CD—CS—based strategy. In terms of battery life degradation, the DDPG—based strategy reduces it by 10.5%, outperforming the CD—Blend—based strategy which experiences a 12% degradation. Moreover, compared to the DP—based strategy, the battery life degradation of the DDPG—based strategy increases by 0.7%, yet it decreases by 1.5% compared to the CD—CS—based strategy. Despite the demonstrated effectiveness and real—time capability through bench tests, due to objective limitations, the research has certain drawbacks. Currently, the proposed strategy is applicable only to road segments without traffic lights. Also, the discussion on the speed prediction method is restricted to fixed traffic flow densities, without considering key traffic flow factors such as traffic volume, speed, and density. Future research will be focused on improving in these areas.

Acknowledgements

The authors gratefully acknowledge the financial support from the 1.Research on Collaborative Optimization of Optimal Speed Control and Energy Management for Connected Automated Range-Extended Electric Vehicle (No. 232699HJ0103108631) 2.Design of Key Mechanical Components and Development of Intelligent Control System for Customized Material Handling Equipment (No. 2023220103000068)

Author contributions

Hanwu Liu managed the project and conceptualized scheme; Hanwu Liu and Xuewen Zhai conceived the control method; Xuewen Zhai and Wencai Sun completed the modeling and performed the simulation experiments, and finished the manuscript; Xuewen Zhai and Zihang Su collected the data and reviewed the paper.

Data availability

Data is provided within the manuscript or supplementary information files.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Lan, S., Stobart, R. & Chen, R. Performance comparison of a thermoelectric generator applied in conventional vehicles and extended-range electric vehicles. Energy Convers. Manag.266, 115791 (2022). [Google Scholar]
2.Chen, B. C., Wu, Y. Y. & Tsai, H. C. Design and analysis of power management strategy for range extended electric vehicle using dynamic programming. Appl. Energy113, 1764–1774 (2014). [Google Scholar]
3.Wu, X., Gu, Y. & Xu, M. Adaptive energy management strategy for extended-range electric vehicle based on micro-trip identification. IEEE Access8, 176555–176564 (2020). [Google Scholar]
4.Uralde, J. et al. Rule-based operation mode control strategy for the energy management of a fuel cell electric vehicle. Batter. Basel10(6), 214 (2024). [Google Scholar]
5.Li, J. et al. A real-time optimization energy management of range extended electric vehicles for battery lifetime and energy consumption. J. Power Sources498, 229939 (2021). [Google Scholar]
6.Sun, H. et al. Data-driven reinforcement-learning-based hierarchical energy management strategy for fuel cell/battery/ultracapacitor hybrid electric vehicles. J. Power Sources455, 227964 (2020). [Google Scholar]
7.Wang, F., Wen, Q. Y. & Xu, B. An energy saving rule-based strategy for electric-hydraulic hybrid wheel loaders. Proc. Inst. Mech. Eng. Part D J. Automob. Eng.10.1177/09544070231191843 (2023). [Google Scholar]
8.Zhang, S. & Xiong, R. Adaptive energy management of a plug-in hybrid electric vehicle based on driving pattern recognition and dynamic programming. Appl. Energy155, 68–78 (2015). [Google Scholar]
9.Hou, C. et al. Approximate Pontryagin’s minimum principle applied to the energy management of plug-in hybrid electric vehicles. Appl. Energy115, 174–189 (2014). [Google Scholar]
10.Hou, Y., Ravey, A. & Marion-Péra, M.-C. Multi-mode predictive energy management for fuel cell hybrid electric vehicles using Markov driving pattern recognizer. Appl. Energy258, 11405 (2020). [Google Scholar]
11.Wang, H. et al. Model predictive control-based energy management strategy for a series hybrid electric tracked vehicle. Appl. Energy182, 105–114 (2016). [Google Scholar]
12.Xu, D. et al. Recent progress in learning algorithms applied in energy management of hybrid vehicles: a comprehensive review. Int. J. Precis. Eng. Manuf. Green Technol.10(1), 245–267 (2023). [Google Scholar]
13.Li, Y. et al. Deep reinforcement learning-based energy management for a series hybrid electric vehicle enabled by history cumulative trip information. IEEE Trans. Veh. Technol.68(8), 7416–7430 (2019). [Google Scholar]
14.Lian, R. et al. Rule-interposing deep reinforcement learning based energy management strategy for power-split hybrid electric vehicle. Energy197, 117297 (2020). [Google Scholar]
15.Liu, H. et al. Research on approximate optimal energy management and multi-objective optimization of connected automated range-extended electric vehicle. Energy306(000), 20. 10.1016/j.energy.2024.132368 (2024). [Google Scholar]
16.Tang, X. et al. Battery health-aware and deep reinforcement learning-based energy management for naturalistic data-driven driving scenarios. IEEE Trans. Transp. Electrific.8(1), 948–964 (2021). [Google Scholar]
17.Du, A. M., Han, Y. & Zhu, Z. P. Review on multi-objective optimization of energy management strategy for hybrid electric vehicle integrated with traffic information. Energy Sources Part A Recover. Util. Environ. Eff.44(3), 7914–7933 (2022). [Google Scholar]
18.Huang, X., Tan, Y. & He, X. G. An intelligent multifeature statistical approach for the discrimination of driving conditions of a hybrid electric vehicle. Intell. Transp. Syst.12(2), 453–465 (2011). [Google Scholar]
19.Miao, Q. et al. Construction of typical driving conditions for buses based on clustering and Markov chain. Chin. J. Highw.29(11), 161–169 (2016). [Google Scholar]
20.Wang, H. Vehicle speed prediction based on clustering and adaptive neuro fuzzy inference. Comput. Eng. Appl.44(6), 240–242 (2008). [Google Scholar]
21.Liu, H. et al. A novel hybrid-point-line energy management strategy based on multi-objective optimization for range-extended electric vehicle. Energy10.1016/j.energy.2022.123357 (2022).36059383 [Google Scholar]
22.Qian, L. et al. Hierarchical energy management and optimization of hybrid electric vehicles based on V2X. Trans. Chin. Soc. Agric. Eng.10.11975/j.issn.1002-6819.2016.19.010 (2016). [Google Scholar]
23.Ha, S. & Lee, H. Energy management strategy based on V2X communications and road information for a connected PHEV and its evaluation using an IDHIL simulator. Appl. Sci. Basel10.3390/app13169208 (2023). [Google Scholar]
24.Hu, B. & Li, J. A deployment-efficient energy management strategy for connected hybrid electric vehicle based on offline reinforcement learning. IEEE Trans. Ind. Electron.10.1109/TIE.2021.3116581 (2022). [Google Scholar]
25.He, H. et al. An improved energy management strategy for hybrid electric vehicles integrating multistates of vehicle-traffic information. IEEE Trans. Transp. Electrific.7(3), 1161–1172 (2021). [Google Scholar]
26.Zhou, Z. et al. A comprehensive study of speed prediction in transportation system: From vehicle to traffic. Iscience10.1016/j.isci.2022.103909 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Song, Z. et al. Energy management strategy for networked extended range vehicles based on reinforcement learning and road condition information. J. Tongji Univ. (Nat. Sci. Ed.)49(S1), 211–216 (2021). [Google Scholar]
28.Yuan, H. B. et al. Optimized rule-based energy management for a polymer electrolyte membrane fuel cell/battery hybrid power system using a genetic algorithm. International journal of hydrogen. Energy47(12), 7932–7948 (2022). [Google Scholar]
29.Chen, J. Y. et al. Research progress on power systems and energy management strategies of connected range vehicles. J. Cent. South Univ. (Nat. Sci. Ed.)55(01), 80–92 (2024). [Google Scholar]
30.Liu, H. et al. Adaptive parameter optimal energy management strategy based on multi-objective optimization for range extended electric vehicle. Proc. Inst. Mech. Eng. Part D J. Automob. Eng.236(8), 1809–1823. 10.1177/09544070211046406 (2022). [Google Scholar]
31.Tang, L., Rizzoni, G. & Onori, S. Energy management strategy for HEVs including battery life optimization. IEEE Trans. Transp. Electrific.1(3), 211–222 (2015). [Google Scholar]
32.Park, D. et al. Eco-driving profile optimization by dynamic programming for battery electric vehicles. Int. J. Automot. Technol.25(6), 1309–1321 (2024). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data is provided within the manuscript or supplementary information files.

[CR1] 1.Lan, S., Stobart, R. & Chen, R. Performance comparison of a thermoelectric generator applied in conventional vehicles and extended-range electric vehicles. Energy Convers. Manag.266, 115791 (2022). [Google Scholar]

[CR2] 2.Chen, B. C., Wu, Y. Y. & Tsai, H. C. Design and analysis of power management strategy for range extended electric vehicle using dynamic programming. Appl. Energy113, 1764–1774 (2014). [Google Scholar]

[CR3] 3.Wu, X., Gu, Y. & Xu, M. Adaptive energy management strategy for extended-range electric vehicle based on micro-trip identification. IEEE Access8, 176555–176564 (2020). [Google Scholar]

[CR4] 4.Uralde, J. et al. Rule-based operation mode control strategy for the energy management of a fuel cell electric vehicle. Batter. Basel10(6), 214 (2024). [Google Scholar]

[CR5] 5.Li, J. et al. A real-time optimization energy management of range extended electric vehicles for battery lifetime and energy consumption. J. Power Sources498, 229939 (2021). [Google Scholar]

[CR6] 6.Sun, H. et al. Data-driven reinforcement-learning-based hierarchical energy management strategy for fuel cell/battery/ultracapacitor hybrid electric vehicles. J. Power Sources455, 227964 (2020). [Google Scholar]

[CR7] 7.Wang, F., Wen, Q. Y. & Xu, B. An energy saving rule-based strategy for electric-hydraulic hybrid wheel loaders. Proc. Inst. Mech. Eng. Part D J. Automob. Eng.10.1177/09544070231191843 (2023). [Google Scholar]

[CR8] 8.Zhang, S. & Xiong, R. Adaptive energy management of a plug-in hybrid electric vehicle based on driving pattern recognition and dynamic programming. Appl. Energy155, 68–78 (2015). [Google Scholar]

[CR9] 9.Hou, C. et al. Approximate Pontryagin’s minimum principle applied to the energy management of plug-in hybrid electric vehicles. Appl. Energy115, 174–189 (2014). [Google Scholar]

[CR10] 10.Hou, Y., Ravey, A. & Marion-Péra, M.-C. Multi-mode predictive energy management for fuel cell hybrid electric vehicles using Markov driving pattern recognizer. Appl. Energy258, 11405 (2020). [Google Scholar]

[CR11] 11.Wang, H. et al. Model predictive control-based energy management strategy for a series hybrid electric tracked vehicle. Appl. Energy182, 105–114 (2016). [Google Scholar]

[CR12] 12.Xu, D. et al. Recent progress in learning algorithms applied in energy management of hybrid vehicles: a comprehensive review. Int. J. Precis. Eng. Manuf. Green Technol.10(1), 245–267 (2023). [Google Scholar]

[CR13] 13.Li, Y. et al. Deep reinforcement learning-based energy management for a series hybrid electric vehicle enabled by history cumulative trip information. IEEE Trans. Veh. Technol.68(8), 7416–7430 (2019). [Google Scholar]

[CR14] 14.Lian, R. et al. Rule-interposing deep reinforcement learning based energy management strategy for power-split hybrid electric vehicle. Energy197, 117297 (2020). [Google Scholar]

[CR15] 15.Liu, H. et al. Research on approximate optimal energy management and multi-objective optimization of connected automated range-extended electric vehicle. Energy306(000), 20. 10.1016/j.energy.2024.132368 (2024). [Google Scholar]

[CR16] 16.Tang, X. et al. Battery health-aware and deep reinforcement learning-based energy management for naturalistic data-driven driving scenarios. IEEE Trans. Transp. Electrific.8(1), 948–964 (2021). [Google Scholar]

[CR17] 17.Du, A. M., Han, Y. & Zhu, Z. P. Review on multi-objective optimization of energy management strategy for hybrid electric vehicle integrated with traffic information. Energy Sources Part A Recover. Util. Environ. Eff.44(3), 7914–7933 (2022). [Google Scholar]

[CR18] 18.Huang, X., Tan, Y. & He, X. G. An intelligent multifeature statistical approach for the discrimination of driving conditions of a hybrid electric vehicle. Intell. Transp. Syst.12(2), 453–465 (2011). [Google Scholar]

[CR19] 19.Miao, Q. et al. Construction of typical driving conditions for buses based on clustering and Markov chain. Chin. J. Highw.29(11), 161–169 (2016). [Google Scholar]

[CR20] 20.Wang, H. Vehicle speed prediction based on clustering and adaptive neuro fuzzy inference. Comput. Eng. Appl.44(6), 240–242 (2008). [Google Scholar]

[CR21] 21.Liu, H. et al. A novel hybrid-point-line energy management strategy based on multi-objective optimization for range-extended electric vehicle. Energy10.1016/j.energy.2022.123357 (2022).36059383 [Google Scholar]

[CR22] 22.Qian, L. et al. Hierarchical energy management and optimization of hybrid electric vehicles based on V2X. Trans. Chin. Soc. Agric. Eng.10.11975/j.issn.1002-6819.2016.19.010 (2016). [Google Scholar]

[CR23] 23.Ha, S. & Lee, H. Energy management strategy based on V2X communications and road information for a connected PHEV and its evaluation using an IDHIL simulator. Appl. Sci. Basel10.3390/app13169208 (2023). [Google Scholar]

[CR24] 24.Hu, B. & Li, J. A deployment-efficient energy management strategy for connected hybrid electric vehicle based on offline reinforcement learning. IEEE Trans. Ind. Electron.10.1109/TIE.2021.3116581 (2022). [Google Scholar]

[CR25] 25.He, H. et al. An improved energy management strategy for hybrid electric vehicles integrating multistates of vehicle-traffic information. IEEE Trans. Transp. Electrific.7(3), 1161–1172 (2021). [Google Scholar]

[CR26] 26.Zhou, Z. et al. A comprehensive study of speed prediction in transportation system: From vehicle to traffic. Iscience10.1016/j.isci.2022.103909 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Song, Z. et al. Energy management strategy for networked extended range vehicles based on reinforcement learning and road condition information. J. Tongji Univ. (Nat. Sci. Ed.)49(S1), 211–216 (2021). [Google Scholar]

[CR28] 28.Yuan, H. B. et al. Optimized rule-based energy management for a polymer electrolyte membrane fuel cell/battery hybrid power system using a genetic algorithm. International journal of hydrogen. Energy47(12), 7932–7948 (2022). [Google Scholar]

[CR29] 29.Chen, J. Y. et al. Research progress on power systems and energy management strategies of connected range vehicles. J. Cent. South Univ. (Nat. Sci. Ed.)55(01), 80–92 (2024). [Google Scholar]

[CR30] 30.Liu, H. et al. Adaptive parameter optimal energy management strategy based on multi-objective optimization for range extended electric vehicle. Proc. Inst. Mech. Eng. Part D J. Automob. Eng.236(8), 1809–1823. 10.1177/09544070211046406 (2022). [Google Scholar]

[CR31] 31.Tang, L., Rizzoni, G. & Onori, S. Energy management strategy for HEVs including battery life optimization. IEEE Trans. Transp. Electrific.1(3), 211–222 (2015). [Google Scholar]

[CR32] 32.Park, D. et al. Eco-driving profile optimization by dynamic programming for battery electric vehicles. Int. J. Automot. Technol.25(6), 1309–1321 (2024). [Google Scholar]

PERMALINK

Research on intelligent energy management strategies for connected range-extended electric vehicles based on multi-source information

Xuewen Zhai

Hanwu Liu

Wencai Sun

Zihang Su

Abstract

Introduction

Literature review

Motivation and innovation

Modeling of powertrain systems and vehicle speed prediction for CAR-EEV

Modeling of powertrain simulation environments and modeling transportation systems

Fig. 1.

Vehicle speed prediction based on multi-source traffic information

Fig. 2.

Intelligent EMS based on DDPG with MOO

An review of the principles of DRL algorithms

Fig. 3.

Priority experience replay

Framework for MOO EMS based on DDPG

Fig. 4.

Verification and results analysis

Simulation results and analysis of vehicle speed prediction

Fig. 5.

Fig. 6.

Analysis of MOO control

Table 1.

Fig. 7.

Fig. 8.

Experimental test implementation and its results

Fig. 9.

Table 2.

Conclusion

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases