Skip to main content
ACS Omega logoLink to ACS Omega
. 2026 Feb 18;11(8):13819–13834. doi: 10.1021/acsomega.5c12136

Carbon Emission Intensity Modeling in Freight Operations Using eXplainable AI in Real-World Conditions

Saket Ranjan 1, Shiva Nagendra Saragur Madanayak 1,*
PMCID: PMC12961449  PMID: 41799098

Abstract

The present study evaluates the effect of real-world operational factors and driving behaviors that significantly contribute to CO2 emissions and total energy consumption of the port-based heavy-duty vehicles (HDVs). Interpretable machine learning techniques are applied within an eXplainable Artificial Intelligence (XAI) framework to assess the impact of input variables on prediction accuracy. The inherent simplifications in these approaches often limit their ability to capture the complex, nonlinear characteristics of vehicular emission determinants, particularly under dynamic, micro-operational conditions associated with real-world settings. XGBoost showed higher predictive accuracy over conventional regression and other ensemble methods, with up to 46% improvement in R 2 and over 80% reduction in estimation errors. To address the black-box nature associated with the model, this study adopts XAI techniques, with SHapley Additive exPlanations (SHAP) employed to quantify feature contributions and enhance the interpretability. The results show that real-world CO2 emission levels remain substantially high under dynamic operational conditions, emphasizing the need for improved transit and freight management strategies to mitigate vehicular emissions. This further reinforces the importance of regulatory frameworks that incorporate CO2 emission and fuel-efficiency standards alongside conventional pollutant limits. Such progressive targets are intended to curb the climate impact, stimulate technological innovation, and support long-term low-carbon transition goals.


graphic file with name ao5c12136_0013.jpg


graphic file with name ao5c12136_0011.jpg

1. Introduction

The transport sector is a leading energy consumer and remains a key source of air pollutant emissions. Globally, heavy-duty vehicles constitute a limited portion of the vehicle fleet, yet account for a substantial share of fuel consumption and greenhouse gas emissions due to their energy-intensive operations. This sector accounts about 70% of the nation’s diesel consumption. From the 1950s to 2011, road transport in India experienced substantial growth, with the passenger share increasing from 15 to 86% and the freight share from 14 to 70%. HDVs serve as an essential connection between long-haul freight movement and last-mile distribution, which improves overall operational efficiency. Recent studies have examined the contribution of HDVs to environmental degradation, especially in urban and industrial transport corridors. Globally, diesel consumption among HDVs varies substantially across regions despite similar fleet sizes, fuel intensity varies due to differences in freight activity, operational patterns, and vehicle technologies. Heavy-duty vehicles represent a major contributor to global fuel demand, with consumption patterns varying widely between countries due to differences in freight intensity and operational activity. Figure illustrates the 2020 HDVs fuel consumption estimates, based on the IIASA GAINS model and standardized using a diesel energy content of 35.8 MJ/L, emphasize the regional disparities in fuel use. , Heavy-duty trucks account for approximately 20% of the total fuel consumption in the United States, underscoring their disproportionately high energy demand in freight transport. In the European Union, heavy-duty vehicles contribute over 25% of road transport-related greenhouse gas (GHG) emissions and account for more than 6% of the EU’s total GHG output. While they are vital to supply chain operations, their emissions were often concentrated near freight corridors, disproportionately affecting communities located near ports, warehouses, and major roads. The steady rise in cargo traffic at Indian ports (Supplementary Figure S1), indicates a sustained intensification of freight activity in recent years, driven by increased demand for bulk commodities and containerized goods. Residents of Low and Middle-Income Countries (LMICs) were often located near major roads and freight hubs, resulting in greater exposure to pollutants from heavy-duty vehicles and increased vulnerability to air-quality-related health risks. Studies in port settings identified the concentrated environmental impacts of HDV activities and suggested targeted operational controls, multimodal solutions, and fleet optimization to lower CO2 emissions. , Heavy diesel trucks used in freight operations emit about 2.6 kg of CO2 per liter of fuel burned. The escalating fuel demand underscores the urgent need for sustainable transport solutions and energy policies. , The European Parliament and the Council introduce mandatory CO2 emission performance standards for new heavy-duty vehicles, including a required 15% reduction in average fleet emissions by 2025 (relative to the 2019 baseline) and more ambitious targets by 2030/2035. These targets are designed to mitigate the climate impact of HDVs, stimulate technological innovation, and align the transport sector with the long-term low-carbon transition goals.

1.

1

Global diesel consumption by heavy-duty vehicles (HDVs) in 2020, represented country-wise. The total fuel consumption (in million liters), highlighting significant regional differences in freight-related energy use.

Previous studies showed that machine learning (ML) techniques improved the accuracy of carbon emission forecasting and supported policy decisions through greater model transparency and interpretability. , However, challenges such as inconsistent data, limited model clarity, and lack of standardized evaluation remained significant limitations. Table presents a summary of the methodological approaches used to model and forecast CO2 emissions in the transportation sector. It was observed that micro-operational trends under real-world conditions had received limited attention in previous studies, leaving useful gap in understanding emissions more precisely. Many studies have successfully modeled transport sector GHG emissions at the macro level using diverse methods and annual predictor data sets such as GDP, population, energy use, and vehicle manufacturing data. These approaches have provided valuable insights into long-term emission trends, though their reliance on broader data sets highlights opportunities to enhance temporal granularity and capture real-world heterogeneity for localized or dynamic scenarios. Similarly, several recent studies employed national inventory data sets derived from manufacturer-reported laboratory driving cycles to estimate fuel consumption and CO2 emissions from light-duty vehicles. While these data sets offer important standardized benchmarks, their representation could be further strengthened by integrating higher temporal resolution and real-world operational dynamics to better reflect localized and dynamic transport conditions.

1. Review of Previous Literature on Modeling Transport-Based Energy and Carbon Emissions.

study time span region evaluation metrics temporal granularity data sets/variables model model strengths model limitations
2005–2015 Taiwan MAPE annual GDP, urban population, energy consumption MGPM interpretable, suitable for long-term trends nonlinear capability
1970–2016 Türkey R 2, MAE, MAPE, MSE, RMSE, MBE annual energy consumption, population, GDP, year, vehicle kilometer XGBoost high accuracy, handles nonlinear relationships interpretability, tuning-sensitive
1990–2017 China APE, MAPE annual population, economy, energy, urbanization, transport, industrial structure PSO+GM effective for limited data sets, trend forecasting complex nonlinear dynamics
2020–2023 UK, Canada MAE, MAPE annual engine power, fuel consumption, CO2 emission, engine capacity DTR interpretable decision structure prone to overfitting, unstable
1995–2014 Canada R 2, MSE, RMSE, MAPE annual (7 yr compiled) engine size, fuel consumption, cylinders, CO2 emission, make, model, fuel type MLP-SHAP captures nonlinearity, interpretable data-intensive, computationally demanding
1995–2014 Canada R 2, MSE, RMSE annual (7 year compiled) engine size, fuel consumption, cylinders, CO2 emission, make, model, fuel type BiLSTM temporal dependency modeling black-box, high training requirement
1997–1999 India RMSE, MBE, MSE, R 2 hourly vehicular source strength (CO & NO2), meteorology–17 feature variable ANN flexible nonlinear modeling transparency, overfitting risk
2014 Poland MSE, R 2 per second CO2, velocity, acceleration, engine load, RPM, altitude, manifold, turbo boost XGBoost robust for high-resolution data feature engineering sensitive
2014–2020 Canada R 2, RMSE, adj. R 2 annual engine size, cylinders, fuel type, consumption, transmission, make, class, CO2 XGB, RF, LGBM ensemble robustness, strong predictive power reduced physical interpretability
2011–2013 Japan R 2, MAE, RMSE per minute energy consumption, trip distance, speed, air conditioner, ambient temperature, road gradient LightGBM fast training, handles large data sets weaker temporal memory
2020 Canada R 2, RMSE per minute speed, density, traffic, flow, delay, CO2 LSTM effective for sequential traffic patterns data-hungry, limited explainability
2015–2016 India RMSE, MBE, IA per second, 5 h vehicle speed, acceleration, VSP, engine RPM, oil temperature NARX captures dynamic system feedback sensitive to noise, parameter selection
2010–2013 US CV, MAD, SD 30 s speed, accel. /decel., driving events and behaviors K-means, Autonomie behavior pattern discovery nonpredictive, cluster-dependent
1990–2019 UK RMSE, rRMSE,MAPE, MAE annual energy consumption, population, electricity, unemployment and other 24 features SVR-RBF, RF, SHAP balanced accuracy and interpretability computational complexity
1970–2021 Turkey MAE, RMSE, R 2, sMAPE annual CO2, Oil consumption, GDP, population, electricity, vehicle count, energy supply ANN, GBDT strong nonlinear learning reduced physical interpretability
2024 China R 2, RMSE 5 s speed, VSP, acceleration, road slope, CO2 DL-DTCEM high-resolution dynamic modeling limited generalizability, high data demand

Advanced freight management relies on improving operational efficiency within a sustainable framework. Yet, limited real-world evidence on HDV operations restricts policy strategies to curb energy and emission footprints. Carbon emissions from road transport were forecasted using freight activity and macro-modal trends, while the uncertainties associated with long-term projections were acknowledged due to changing economic and policy dynamics. , An Artificial Neural Network (ANN)-based line source model was introduced to predict urban vehicular emissions using field data on traffic flow, vehicle classification, and meteorological parameters, highlighting the effectiveness of machine learning in analyzing large-scale and complex emission data sets. Gradient boosting regression was applied to assess transportation-related CO2 emissions at the macro level, identifying road density, industrial activity, and vehicle ownership as the primary contributing factors. Carbon emissions from road freight transport were demonstrated that tree-based models, particularly decision trees, outperformed traditional linear approaches in predicting CO2 emissions from vehicles, offering improved accuracy in emission estimation. A hybrid approach integrating machine learning techniques to forecast urban carbon emissions was proposed and demonstrated approximately 20% improvement in predictive accuracy over individual models, while data sparsity in some regions posed challenges. GPS-based microtrip analysis with XGBoost was employed, trained on Motor Vehicle Emission Simulator (MOVES) data to predict emissions. A novel carbon emission forecasting method that integrates the environmental kuznets curve hypothesis with a nonlinear multivariate gray model was developed to improve prediction accuracy for transportation sector emissions. However, the generalizability of the approach was limited by variations in operational factors and applicability across other regions. An ensemble model integrates Random Forest (RF) and XGBoost for transient CO2 and NO x prediction, identifying real-time driving variables as key emission determinants. The application of machine learning, particularly XGBoost, in microscale dynamics improved emission prediction accuracy compared to traditional models. , Model-agnostic explanations tool like SHAP, and partial dependence plots, while facilitating evidence-based policy development with their visual Interpretability enabling effective communication of complex model outputs.

The limited interpretability in machine learning models poses significant challenges, especially in contexts requiring transparent and explainable decision making. While SHAP had recently been applied across various domains. , This study develops an integrated, data-driven framework to link real-world driving characteristics with CO2 emission profiles, using micro-operational data collected from 14 HDVs engaged in port-based cargo transfer. By addressing the limitations in capturing complex, nonlinear emission dynamics, the framework applies interpretable machine learning techniques within an eXplainable AI (XAI) setting to evaluate variable impacts on predictive performance. This approach proposes an understanding of emission-sensitive operational regimes, offering an analytical basis for targeted freight management strategies, operational efficiency, and measures to curb emissions from HDV fleets.

2. Materials and Methods

2.1. Study Site and Measurement Design

Field measurements were performed at Chennai port, one of the largest and oldest major ports in India, situated along the Bay of Bengal, as shown in Figure . Covering around 590 acres of land and 420 acres of water, the port functions as a major hub for container trade, automobile exports, and bulk cargo along eastern coast of India. It is equipped with three docks, 24 berths, and a draft depth ranging from 12 to 16.5 m, supporting an annual cargo handling capacity of up to 60 million tonnes.

2.

2

Geographic location and layout of Chennai Port, India. Field photographs showing port operations, test vehicles, and on-board measurement systems were taken by the authors. The Chennai Port layout map was sourced from the Chennai Port Authority Web site.

Measurements under real-world vehicle operating conditions were conducted to evaluate effectiveness specific to port activities. The monitoring campaign took place during routine daytime operations (10:00–16:00) at the port terminal, aligning with scheduled cargo handling activities. Each monitored trip included multiple loops along a predefined internal route, encompassing both loading and unloading operations. Vehicle movement along the berths was organized as a two-way traffic system. The study involved 14 heavy-duty diesel vehicles (HDDVs) comprising tractor–trailer configurations broadly corresponding to Class 7–8 vehicles, as summarized in Table . The container used in port operations had a maximum gross weight of 30.48 tonnes, including a tare weight of approximately 2.19 tonnes. In routine operations, these vehicles typically carry payloads ranging from approximately 18 to 30 tonnes, depending on container size, cargo density, and operational constraints. Fully loaded operations generally involve a single 20-ft container, corresponding to a standard ISO container, commonly referred to as a Twenty-foot Equivalent Unit (TEU), as summarized in Supplementary Table S1. The TEU is a standardized unit of measurement used in shipping and port operations, whereas off-loaded operations involve empty containers or trailers with negligible payload (without containers). The reported vehicle weight range reflects real-world port-operational conditions, where containers are frequently moved in partially loaded or intermediate logistics stages and are constrained by axle-load compliance and safety considerations. The monitored vehicles were selected to represent the dominant vehicle configurations used in routine port operations. These internal transfer vehicles (ITVs) include both articulated (tractor-trailer) and rigid vehicles employed for container and freight movement within the port terminal. Key parameters such as vehicle age (in years), cumulative mileage (in kilometres), and curb weight (in tonnes) were manually recorded. Vehicles were not fitted with any advanced after-treatment systems. A total of 375,000 s-by-second data samples were collected. GPS and emissions data were recorded independently and later synchronized to align timestamps, enabling analysis of vehicle position, activity patterns, and associated emissions. All field monitoring adhered to standardized safety protocols in accordance with the International Maritime Dangerous Goods (IMDG) code, issued by the International Maritime Organization, ensuring safe practices during container and commercial cargo handling.

2. Summary of Internal Transfer Vehicle Characteristics Monitored during Port Operations .

vehicle ID/ITV engine disp. (cc) model year emission std. cabin unladen weight (kg) chassis type load state
1 5660 8 yr 10 m BS-III dual cab 15,320 articulated unloaded
2 5883 6 yr 10 m BS-IV dual cab 15,560 rigid unloaded
3 5700 7 yr 4 m BS-IV dual cab 15,560 rigid unloaded
4 5900 7 yr 8 m BS-IV single cab 13,900 articulated unloaded
5 5883 6 yr 10 m BS-IV dual cab 14,740 articulated unloaded
6 5660 3 yr 6 m BS-IV dual cab 15,320 articulated unloaded
7 5660 7 yr 11 m BS-III single cab 13,900 articulated unloaded
8 5700 5 yr 3 m BS-IV dual cab 15,560 rigid loaded
9 5660 9 yr 10 m BS-III dual cab 15,320 articulated loaded
10 5660 8 yr 9 m BS-III dual cab 15,560 rigid loaded
11 5700 9 yr 10 m BS-III single cab 14,020 rigid loaded
12 5660 7 yr 8 m BS-IV dual cab 15,560 rigid loaded
13 5660 6 yr 2 m BS-IV single cab 14,020 rigid loaded
14 5990 10 yr 4 m BS-III dual cab 15,560 rigid loaded
a

Abbreviations: yr = years; m = months; cc = cubic centimeters. Articulated refers to HDVs with a jointed chassis (tractor–trailer configuration).

2.2. On-Board Emission and Fuel Use Characterization

Vehicular emission characteristics were derived from second-by-second operational parameters. Vehicle speed and altitude data were concurrently recorded at 1 Hz using hand-held GPS units (Garmin e-Trex 20X). To ensure accuracy, on-board measurement, acceleration–deceleration (AD) and positional data were time-synchronized and logged at a one-second resolution. Emission measurements were performed using an on-board portable emissions measurement system (PEMS), which consisted of an AVL Ditest, a certified exhaust gas analyzer approved by Automotive Research Association of India (ARAI) for testing and validation. An exhaust gas sampling probe, control unit, and data logger were integrated for real-time acquisition and powered by an uninterruptible power supply (UPS). The components were installed in the driver compartment (Figure ). It utilizes an infrared-based NDIR technique for CO2 detection. The system was calibrated for zero and span both before and after each test cycle, resolution with an accuracy range of ±0.3% vol for CO2. The vehicular emission characteristics were determined from instantaneous operational metrics, as given in eq .

E(θ)=f(Vx) 1

where, E(θ) represents the emission factor for pollutant (θ), while V x denotes the vehicle operational parameters such as speed, exhaust flow, air-fuel ratio (AFR). The emission factor was determined from instantaneous CO2 concentration measured at one-second intervals, expressed in either volume percentage (% vol) or parts per million (ppm).

Fuel consumption was estimated with the carbon balance method (eq ), based on the emission rates of carbon pollutants. The adopted carbon mass fractions were 0.273 for CO2, 0.429 for CO, and 0.866 for HC.

FR(t)=[0.273×ERCO2(t)+0.429×ERCO(t)+0.866×ERHC(t)]/WC 2
FEbulk=360×FRρd×ν 3

where, FR is the average fuel consumption rate (g/s) for vehicle, t in seconds, and W C is the fuel carbon content coefficient (0.866 for diesel). Furthermore, distance-specific fuel efficiency for vehicles was calculated in liters per 100 km using eq , for diesel density ρd = 0.85 kg/L, following USEPA. FEbulk is the bulk fuel economy in L/100 km, and ν in km/h.

2.3. Machine Learning Techniques

Description of the applied ML techniques used in the analysis is presented in the subsequent section. The hyperparameter search space and selected optimal values for each model are summarized in Table .

3. Search Space for Hyperparameter Tuning.

model search space
MLR linear: ordinary least-squares ∈ no regularization
LASSO alpha: L1 penalty strength; controls sparsity ∈ 0.0012
Bayesian Ridge alpha_1: prior over weights (gamma distribution) ∈ 1 × 10–6
alpha_2: prior over noise (gamma) ∈ 1 × 10–4
lambda_1: prior over weights precision ∈ 1 × 10–4
lambda_2: prior over noise precision ∈ 1 × 10–6
Random Forest learning rate (η): [0.01–0.3] ∈ 0.07
N estimators (Trees): [50–300] ∈ 200
maximum tree depth: [3–20] ∈ 19
min samples split: [0.5–1.0] ∈ 6
min samples leaf: [0.5–1.0] ∈ 1
max features: [auto, sqrt, log2] ∈ sqrt
XGBoost learning rate (η): [0.01–0.3] ∈ 0.07
N boosting rounds (trees): [50–300] ∈ 250
gamma: [0–0.5] ∈ 0.048
maximum tree depth: [3–10] ∈ 7
subsample ratio of training instances: [0.5–1.0] ∈ 0.6
subsample ratio of columns by tree: [0.5–1.0] ∈ 0.8

2.3.1. MLR

Multiple Linear Regression (MLR) is a standard technique for predicting the value of a dependent variable Y i from a set of independent variables X ij . The model is shown in eq .

Yi=j=1kβjXij+εi 4

where, k is the total number of independent variables, β j indicates the coefficient for variable j, X ij is the value of the jth variable for observation i, and ε i is the residual error term.

2.3.2. Bayesian Ridge

Bayesian ridge regression (BRR) estimates probability distributions for the parameters, taking into account data uncertainty and using prior knowledge to make the estimates more reliable is computed from eq .

β=i=1n[yif(xi)]2+λi=1nβi2 5

where, f(x i ) represents the model prediction for the ith observation, incorporating prior knowledge about the parameters before observing the data. The term i=1n(yif(xi))2 denotes the total residual sum of squares (RSS) across all n observations and i=1nβi2 , is the sum of squared regression coefficients, which, together with the RSS, balances data fit and the L2 regularization penalty.

2.3.3. LASSO

In the Least Absolute Shrinkage and Selection Operator (LASSO) regression, the ridge penalty is replaced with an L1 norm penalty defined by λi=1M|βh,i| . This gives the estimator in eq used to calculate the model’s coefficients.

βh^=argminβh[t=1Th(yt+hβ´hxt)2+λi=1M|βh,i|] 6

The model selects features whose coefficients remain nonzero after regularization. Notably, LASSO can perform effectively even when the number of features exceeds the number of samples. Here, T denotes the sample size, M is the total number of features, and λ is a hyperparameter that controls the degree of shrinkage.

2.3.4. Random Forest

Random Forest (RF) is an ensemble method that builds multiple decision trees and combines their results. It uses information gain or impurity to determine the best split at each node. In decision trees, nodes contributing the highest impurity reduction typically appear near the root, whereas those with lower reductions are found toward the leaves. Consequently, selective pruning at specific nodes can help extract a subset of the most relevant features.

2.3.5. XGBoost

Extreme Gradient Boosting (XGB) employs second-order (newton) tree boosting to fit additive decision tree models. In this framework, decision trees act as base learners, performing binary feature splits (if-else conditions) to either classify samples in classification tasks or predict continuous outcomes in regression tasks. This approach provided evaluation of generalization performance, as shown in Figure . For a data set containing n examples, D = {(x i , y i )} i = 1 , the prediction from a single regression tree f t at iteration t (eq ).

ft(xi)=j=1TwjI(xiRj) 7

where R j denotes the region (leaf) j, w j is the leaf output, and I is the indicator function. XGB optimizes its learning process by incrementally adding f t (x i ) to the ensemble of base learners to minimize the regularized objective function, whose second-order approximation at iteration t (eq ).

Ũ(t)=Dj=1T[giwj+12hiwj2]=j=1T[Gjwj+12Hjwj2] 8

where Gi=Dgi , Hi=Dhi , and g i and h i are the first and second derivatives of the loss function w.r.t. the prediction. In each iteration, the optimal leaf weights w j for a fixed tree structure are derived by setting the derivative of the objective to zero (eq ).

wj*=GjHj,j=1,...,T 9
4.

4

Model evaluation of the XGBoost for CO2 emissions (kg hr–1). (a) Actual vs predicted for the training set (left) and test set (right). (b) Residual plots for using both training (5-fold cross-validation) and test data sets.

Building the tree structure, XGB evaluates candidate splits based on their ability to reduce the objective (eq ). The optimal split is determined by calculating the total score of a tree, The gain from a potential split is computed as eq .

obj*=12j=1TGj2Hj 10
Gain=12[GL2HL+GR2HRG2H] 11

where, G L, H L and G R, H R correspond to the left and right child nodes, respectively. Splits that result in negative or low gain are pruned bottom-up to avoid overfitting and reduce complexity.

2.4. Analytical Framework of Integrated Model Development

This section outlines the analytical procedures used to evaluate the performance of the machine learning models, including feature extraction, training-validation, and accuracy assessment through statistical and error-based metrics. Analyses were performed on instantaneous fuel consumption, real-world operational parameters, and corresponding CO2 emissions. Normal probability and QQ plots were used to evaluate the distributional characteristics of the features and to determine the extent to which they conformed to normality (Figure S3). The kolmogorov–smirnov test reported a statistic of 0.127 (p-value < 0.05) across speed bins, confirms a significant deviation of the CO2 emission data from normality (Figure S5), thereby justifying the use of nonlinear modeling approaches. A two-factor ANOVA was performed to evaluate the influence of speed bins and vehicle loading states on CO2 emissions and fuel consumption using a large micro-operational data set (Table S4). The resulting p-values for all comparisons were below 0.05, indicating statistically significant differences in mean emissions and fuel use across speed bins and between loading conditions. Pearson correlation coefficients were calculated to identify potential multicollinearity among independent variables (Figure ). Finally, all numerical variables were scaled to a standard range using min-max normalization. While the focus of this study is on predicting CO2 emissions under real-world operational conditions, it also aims to apply a multilevel data approach integrated with ML to develop and compare five models. Comparative modeling was conducted using MLR, LR, BRR, RF, and XGB. The algorithms were benchmarked to evaluate predictive performance in terms of accuracy, reliability, and generalization, facilitating the identification of key influencing predictors for optimal model selection. , In this approach, tree models act as base learners, applying if-else or true-false feature splits to classify data points in classification problems or predict continuous values in regression problems. The algorithmic architecture, as illustrated in Figure , depicts the input feature vectors (x) and the independent and identically distributed random vectors (w j ). The XGBoost model aggregates the weighted contributions of all individual decision trees to generate the final predictive output. Furthermore, partial dependence and SHAP analysis was employed to assess the impact of individual features on model performance, grounded in game theory principles, this approach assigns each variable a weighted contribution, reflecting in the model predictions.

3.

3

Correlation matrix (pearson) of input features.

5.

5

Schematic representation of the ML based modeling framework.

2.5. Hyperparameter Settings

Effective hyperparameter optimization is critical for improving the performance of machine learning models. A preliminary study was conducted to manually optimize parameters in order to assess model performance and analyze the influence of different features on CO2 and energy consumption. The optimal hyperparameter values identified during model tuning for each ML model are presented in Table .

2.6. Model Evaluation Metrics

In this study, subsequent data processing and computational analysis were carried out using Google Colab, a cloud-based open-source python environment that supports jupyter notebooks, for the training and testing of model. To evaluate model performance, 5-fold cross-validation was employed by partitioning the data set into five equal subsets. The model was iteratively trained on four folds and tested on the remaining fold, rotating until each subset served as the test set once. The model selection process involved evaluating all possible combinations of predictor variables, with the optimal model determined based on four key performance metrics, the coefficient of determination (R 2), root mean squared error (RMSE), mean absolute error (MAE), and mean squared error (MSE), as defined in eqs and .

R2=1i=1n[y(ui,vi)ŷ(ui,vi)]2i=1n[y(ui,vi)(ui,vi)]2 12
RMSE=1ni=1n[y(ui,vi)ŷ(ui,vi)]2 13
MAE=1ni=1n|[y(ui,vi)ŷ(ui,vi)]| 14

where n is the number of observations, y(u i , v i ) is the actual value for observation, ŷ(u i , v i ) is the predicted value for observation, y®(ui,vi) is the mean of the actual values.

2.7. Shapley Additive Explanation (SHAP) Based Feature Interpretation

SHAP integrates optimal credit allocation with localized explanations by applying the Shapley value concept from cooperative game theory. Specifically, SHAP provides individual-level model interpretation for a given prediction, f(x), by decomposing it into the sum of feature contributions and the baseline prediction, as expressed in eq :

f(x)=ϕ0(f,x)+k=1Kϕk(f,x) 15

where, ϕ0(f, x) represents the expected value (mean) of the model predictions across the entire data set, while ϕ k (f, x) denotes the Shapley value quantifying the contribution of the kth feature to the specific prediction f(x), relative to the baseline ϕ0(f, x). The Shapley value method fairly distributes the contributions of explanatory variables by averaging their marginal contributions across all possible feature subsets for each instance. The relative significance of the kth feature, I k mode is obtained by averaging its absolute Shapley values across all n samples computed from eq :

Ikmode=1ni=1n|ϕk(f,x)i| 16

Subsequently, the feature importance for predicting mode shares is aggregated across all relevant variables and normalized to show relative importance, as indicated in eqs and :

Ik=k19(Ikspeed+Ikaccel./decel.+Ikfueleff.+Ikmileage+Ikenginedisp.+Ikage+Ikweight+IkAFR+Ikaltitude) 17
k=kkk×100% 18

SHAP dependence matrix interprets the relationship between specific features and their contribution, illustrating their directional impact on the mode share predictions. Each point in the plot corresponds to an observation in the data set, where positive SHAP values indicate a positive contribution of the feature to the predicted mode share (Figure ), whereas negative values imply a suppressive influence.

6.

6

SHAP summary plot illustrating the relative influence of features on predicted emissions.

3. Results and Discussion

3.1. Effect of Operational Condition on Real-World Emissions

A preliminary statistical analysis was conducted to evaluate key operational features and CO2 emissions of ITVs under loaded (with container payloads) and off-loaded (empty containers or negligible payload) operating conditions, as presented in Table . Vehicles operating under load exhibited a lower mean speed (7.12 km/h) compared to off-loaded conditions (10.31 km/h), due to more restricted mobility during cargo transport in port settings. The mean acceleration was close to zero in both cases, consistent with frequent idling and stop-and-go behavior. Off-loading operations show a higher average speed and acceleration (Figure S2), with vehicles spending more time in cruise mode (57.31%) and less in idle (23.72%) compared to loading operations presented in Table S3. Conversely, loading conditions were characterized by significantly higher idle time (36.41%) and lower speeds, suggesting more frequent stops and slower manoeuvring during cargo handling. Fuel consumption differs notably under load and off-load conditions, as shown in Figure S4. The load scenario exhibited a longer right tail (>10 L/h) in the Gaussian kernel density estimation, suggesting periods of higher, less efficient fuel usage. Fuel consumption rates were markedly higher under load than off-load. Similarly, CO2 emissions under load averaged 12.79 kg/h was approximately twice that recorded in off-load conditions. The vehicle fleet operating under load conditions was older and had higher cumulative mileage than those in off-loaded operation, potentially contributing to higher fuel consumption and emissions. Although both conditions involved engines of similar displacement, AFR with minor skewness variations suggesting subtle differences in combustion dynamics. Changes in altitude were minimal and symmetrically distributed over the level terrain of port corridors.

4. Descriptive Statistics of Operational and Vehicular Attributes in Real-World Conditions.

(a) Load Condition
N obs. (14 ITV) mean std dev. min 0.25 0.75 max variance skewness
speed (km hr–1) 7.12 6.97 0 0 11.99 30.5 48.59 0.63
acceleration (m s–2) 0 0.21 –1.4 0 0.1 2 0.04 –0.63
fuel efficiency (L h–1) 5.32 3.16 0 1.85 6.72 29.74 9.97 0.67
mileage (103 km) 44.57 9.15 37.41 37.9 56.52 62.36 83.61 1.02
engine (cm3) 5719.67 113.89 5600 5600 5900 5900 12.74 0.65
age (months) 98.66 17.86 74 84.3 117 124 319.2 –0.03
AFR 21.54 5.81 14 17.64 25.11 33.6 33.74 0.68
Δaltitude (m) –0.04 0.72 –4.8 –0.3 0.2 4.8 0.52 0.58
CO2 (kg h–1) 12.79 8.79 0 4.93 18.66 36.71 77.24 0.63
(b) Off-Load Condition
N obs mean std dev. min 0.25 0.75 max variance skewness
speed (km hr–1) 10.31 7.33 0 2.82 15.71 34.53 53.77 0.07
acceleration (m s–2) 0 0.12 –1.5 0 0 1.9 0.01 0.43
fuel efficiency (L h–1) 3.61 2.1 0 1.52 4.52 17.29 4.39 1.45
mileage (103 km) 36.75 7.78 23.79 29.07 43.58 47.22 60.85 –0.1
engine (cm3) 5755.29 117.58 5600 5700 5900 5900 13.2 0.2
age (months) 70.54 21.74 42 53 85.2 106 472.6 0.27
AFR 21.8 6.82 14 14 26.56 33.61 46.56 0.15
Δaltitude (m) –0.01 0.59 –4.9 –0.1 0.1 4.7 0.35 –0.32
CO2 (kg h–1) 8.32 5.61 0 3.98 11.27 34.81 31.47 1.46

3.2. Multicollinearity Analysis

Multicollinearity affects model interpretability and stability by making it difficult to isolate the individual impact of independent variables. It can distort coefficient estimates and reduce the statistical significance of predictors. Table presents the Variance Inflation Factor (VIF) and tolerance for the selected input features. speed, acceleration, AFR, and altitude exhibit low VIFs, indicating minimal multicollinearity. The data set was iteratively pruned by removing highly collinear features in regression models until all variables exhibited VIF values below 10. Fuel efficiency and CO2 emissions were strongly correlated and show high VIFs (>7), suggesting redundancy (Figure ). Slight trend indicated that larger engines were associated with older vehicles. A moderate link between speed and CO2 suggests its contribution to emission variability, while mileage, engine size, and age exhibited intercorrelations (r = 0.5–0.7) alongside acceptable VIFs.

5. Multicollinearity Analysis.

feature speed accel. fuel eff. mileage engine age weight AFR ALT CO2
VIF 1.22 1 8.3 3.84 4.11 4.71 3.62 1.11 1 7.69
tolerance 0.82 1 0.12 0.26 0.24 0.21 0.28 0.9 1 0.13

3.3. Nonlinear Interactions of Predictors in ML

The XGBoost model showed high predictive performance, as indicated by steadily decreasing RMSE with larger training sets and close alignment between training and validation errors (Figure ). Though some dispersion was observed at higher predicted values, the residuals remained homoscedastic. Speed-bin-wise fuel use and CO2 emission characteristics under real-world load conditions (Table S2). In 0–10 km/h range, emissions remain relatively low, with a mean of 6.98 kg/h, though some occasional spikes were noted, likely due to idling and stop and go scenarios. Emissions increased sharply in the 10–20 km/h range (mean 11.88), corresponds to initial acceleration. In the 20–30 km/h range, mean emissions decreased but with peaks up to 36.71 kg/h, indicating variability. Emissions were highest beyond 30 km/h with a mean of 18.24 kg/h, consistent with elevated fuel demand at sustained speeds. Emissions increased as speed rises (Figure ), with the most pronounced and consistent emissions observed at speeds exceeding 25 km/h

7.

7

(a) Learning curve of the XGB regressor. (b) Training and testing RMSE progression across epochs.

10.

10

Spatial interpretation of vehicle speed, fuel efficiency, and partial dependence of predicted CO2 emissions (kg hr–1) under (a) off-load and (b) load operations at the port terminal.

Predicted CO2 emissions show strong correspondence with real-world driving operations. Fuel consumption, speed, load proxies (mileage, age, engine displacement), and acceleration events emerge as key emission drivers, as shown in Figure . Model predicted CO2 emissions from older vehicles averaged 14.72 kg/h, approximately 76% higher than those observed in newer vehicles. The trend was likely attributable to progressive engine wear and mechanical efficiency associated with vehicle utilization. The observed variability in emissions among middle-aged vehicles may result from disparities in maintenance, usage intensity, and operating conditions. In port operations, the terrain remained predominantly level with few gradients or minimal altitude variations, thereby minimizing the influence of altitude on emission levels. Emissions were slightly elevated under ascent conditions (10.75 kg/h) relative to level-altitude operation. However, the overall effect of altitude on emission was minimal, fuel demand remained relatively stable over the level terrain of port corridors. The analysis indicated a 39.43% increase in average CO2 emissions from lower to higher engine displacements, reflecting a substantial rise with engine displacements. Notably, significant emission variability was observed even among vehicles with similar engine size and weight, indicating the influence of operational factors such as idle time, gear usage, vehicle age, and maintenance condition. AFR showed high variability at lower speeds, averaging 21.83 (10–20 km/h) and declining to a stable range at 20–30 km/h Emissions increased progressively across mileage bins, likely due to elevated fuel consumption and reduced efficiency associated with vehicle age and usage. Low-mileage vehicles (15,000–35,000 km) averaged 6.44 kg/h, vehicles with moderate kilometers traveled (35,000–50,000 km) representing 35% increase while heavily used vehicles (>50,000 km), nearly 86% higher emission than low-mileage vehicles.

8.

8

CO2 emission predicted trends as influenced under varying real-world operational conditions and vehicle characteristics.

3.4. Feature Dependence SHAP Analysis with Associated Contribution

To assess the influence of individual features on model output, SHAP values were evaluated across the data sets with the variation for specific feature contribution. SHAP assigns an importance value to each feature, reflecting its contribution to the variability in model output, both the degree and direction of each feature impact on model (Figure S10). Significant relationship was observed, with SHAP dependence plot highlighting a varied influence of each operational predictors on emissions, as shown in Figure . Each module shows the distribution of SHAP values for individual features across the data set, indicating potential interactions within the model, with color intensity representing feature statistics. In the distribution, the x-axis represents the feature values, while the y-axis displays the corresponding SHAP values. Such interactions underlines the multifactorial and complex trends under real-world driving conditions.

9.

9

SHAP-based interpretation of model predicted real-world CO2 emission highlights the relative importance and directional impact of operational features.

Higher mileage values generally corresponded to slightly positive SHAP values, indicating that vehicle wear over time may contribute to increased emissions. Similarly, SHAP for engine displacement unveiled clustered bands. A high positive and nearly linear trend was observed with both fuel rate and predicted CO2 emissions (Table S5), indicating that higher fuel usage directly corresponds to elevated emissions. Engine size and age both influenced emissions, with larger engines showing greater SHAP contributions and older vehicles emitting more CO2, consistent with reduced efficiency and outdated designs. Altitude change contributed minimally, with SHAP centered around zero. However, at extreme AD values (sharp acceleration or deceleration), slight increases in SHAP were observed, suggesting transient operating conditions likely associated with congestion, which may modulate the influence of other features such as mileage, AFR, and engine displacement in port environments.

4. Performance Comparison of Model Outputs

The performance metrics were used to evaluate the predictive accuracy, and generalization of each model, enabling comparison of emission modeling approaches and validate the obtained results. Comparison of RMSE, MAE, and R 2 values across models for both train and test data sets is presented in Table . The MLR model demonstrated consistent performance, indicating a reasonable ability to generalize across data sets. However, a minor drop in R 2 suggested slight overfitting. LASSO regression improved upon MLR by achieving lower RMSE and MAE values and demonstrated slightly better generalization. Bayesian regression showed better performance than MLR model, yielding lower RMSE values and slightly higher R 2 values, indicating that it captured underlying patterns with relatively stable predictive performance. The Random Forest model demonstrated strong performance on the training set, indicating a high capacity to fit the data. However, a notable increase in RMSE and a decline in R 2 on the testing set shown a moderate degree of overfitting. Despite this, Random Forest outperformed both MLR, LASSO and Bayesian regression in predictive accuracy for both train and test set, underscoring its suitability for large data applications. The model evaluation, comprising actual–predicted emission (5-fold cross-validation) and residual error distributions, as shown in Figures S6–S9. While the hyperparameter tuned XGBoost model exhibited stable and reliable performance across both data sets. The model significantly enhanced performance over random forest, reducing RMSE by 12.5% and MAE by 7.4%. The XGBoost model showed superior predictive accuracy under real-world conditions, achieving a strong balance between model accuracy and less errors. ,

6. Model Performance Metrics.

data approach RMSE MAE R 2
MLR
training set linear regression 3.43 1.61 0.85
testing set 3.87 1.63 0.83
Lasso
training set regularized linear regression (L1 regularization) 3.41 1.64 0.81
testing set 3.67 1.67 0.79
Bayesian
training set probabilistic linear regression 2.07 1.79 0.74
testing set 2.14 1.81 0.71
Random Forest
training set ensemble, bagging, decision tree-based 1.78 0.54 0.94
testing set 1.91 0.57 0.91
XGB Regressor
training set ensemble, boosting, decision tree-based 1.59 0.51 0.97
testing set 1.63 0.53 0.95

Residual showed minimal bias and homoscedasticity, validating the model reliability in estimating large real-time data sets (Figure ). Achieving both high accuracy and interpretability is critical for freight management and emission characterization. Extending from predicting CO2 emissions and energy usage, the proposed framework integrates predictive capabilities with spatial analysis, enabling modeling of intensities across locations and facilitating hotspot identification, as shown in Figure . Applicable to traffic regulation in high-impact areas like highways and intersections, this approach enables identification of emission-dense zones that may pose exposure risks to pedestrians. Predicted emissions in the real-world condition closely reflected the observed trends, with model interpretability further strengthening understanding and applicability under complex conditions.

5. Limitation and Future Research

This study evaluates the CO2 intensity associated with heavy-duty vehicle (HDV) operations within port logistics. Overall, the findings underscores the need to optimize port operations, vehicle performance, and driving behavior to improve efficiency and minimize environmental impacts. Future research may focus on advanced model tuning, integration of zero-emission technologies, and exploration of policy and technological pathways for low-carbon freight systems. The model can be integrated with dispersion modeling frameworks to improve accuracy under extreme conditions, leading to the development of high-precision hybrid models. Further improvements in emission modeling would benefit from detailed spatial data, road networks, and infrastructure information. Moreover, upcoming research should evaluate particulate matter and other harmful gases, given their major role in urban air pollution and health impacts.

6. Conclusions and Policy Implications

In the present study, real-world operational factors such as speed, mileage, altitude, air-fuel ratio, acceleration/deceleration, engine displacement, and weight were found to substantially influence emission outcomes. Vehicle operations resulted in a 64% increase in CO2 emissions under load, while acceleration phases amplified emissions by 89.5%, underlining the substantial energy requirements of port-based freight operation. Machine learning models were employed to assess emission prediction capabilities. The XGBoost model showed stable and robust predictive performance, with strong R 2 metrics indicating that the model effectively captures variance in the data and suitable for real-world use. While Random Forest achieved similar accuracy, it showed signs of overfitting, whereas other regression models offered moderate yet generalizable results. Moreover findings revealed that vehicle usage and age influence emission variability. Vehicles with mileage (50,000–75,000 km) exhibited a 35% increase in emissions, while older fleets (with an average age difference of about 5 years) emitted CO2 at rates nearly 76% higher (8.31 kg h–1) than newer vehicles. Additionally, loaded operations showed significantly higher average fuel consumption (5.32 L/h) representing 47% increase compared to off-load conditions. Furthermore, reduced acceleration variability (a < ±2), indicative of smoother driving behavior, contributed to fuel efficiency and emission reductions of up to 23%.

The results underscore the considerable ecological footprint of freight-sector activities. The adoption of regulation from the European Commission marks a paradigm shift in transport decarbonization by introducing CO2 emission standards for heavy-duty vehicles. The policy underscores the role of data-driven compliance, technological innovation, and regulatory accountability in achieving long-term climate neutrality. This framework serves as a benchmark for developing nations to establish structured emission reduction targets and performance based incentive systems for freight transport. Under the National Clean Air Programme (NCAP), over 108 cities have been identified as nonattainment areas. In response, the government and local authorities have been implementing a range of strategies to curb vehicular pollution in urban regions. Energy-efficient practices, such as the adoption of cleaner fuels and electrification can help reduce the carbon and pollution footprints of logistics operations, as reflected in initiatives like Harit Sagar under the green port guidelines. Given their significant contribution to emissions under real-world driving conditions, it is essential to strengthen vehicle inspection systems to better monitor and track aged, high-mileage, and inadequately maintained heavy-duty vehicles, which often exhibit elevated emission levels. Furthermore, decarbonizing road transportation will require the deployment of advanced solutions, such as solar-powered electrified roads within terminal areas, which can support net-zero emission goals and reduce dependence on fossil fuels.

Supplementary Material

ao5c12136_si_001.pdf (1.9MB, pdf)

Acknowledgments

The authors would like to thank the officers and staff of the Chennai Port Authority for their kind permission and Shakti Sustainable Energy Foundation for their valuable support in facilitating this study. The authors also acknowledge the use of the High-Performance Computing Environment (HPCE) at IIT Madras, along with the institutional support and facilities provided for carrying out this research.

Data including the code repository for ML in python scripts are available in the model source file at 10.5281/zenodo.17043313

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.5c12136.

  • Commodity-wise cargo traffic trends at major Indian ports (Figure S1); specifications of the standard ISO 20-ft container (Table S1); speed-bin-wise fuel consumption and CO2 emission characteristics under loaded and off-loaded conditions (Table S2); driving phase distribution during transfer operations (Table S3); ANOVA results for speed-bin fuel consumption and CO2 emissions under load and off-load conditions (Table S4); instantaneous relationships among speed, acceleration/deceleration, fuel consumption, and CO2 emissions (Figure S2); statistical distribution and normality analysis of fuel consumption and CO2 emissions (Figure S3); kernel density estimation of fuel consumption (Figure S4); CO2 emission density distribution with normal fit (Figure.S5); evaluation of MLR, Lasso, Bayesian Ridge, and Random Forest models for actual vs predicted emissions (Figure S6–S9); and SHAP feature importance and contribution analysis (Table S5 and Figure S10) (PDF)

The authors declare no competing financial interest.

References

  1. International Energy Agency (IEA) . Future of Trucks: Implications for Energy and the Environment. International Energy Agency, 2017. Available at: https://www.iea.org/reports/the-future-of-trucks (accessed January 24, 2025).
  2. PPAC (Petroleum Planning and Analysis Cell) and Nielsen . All India study on sectoral demand of diesel and petrol; Ministry of Petroleum and Natural Gas, Government of India: Delhi, 2013. [Google Scholar]
  3. Malik L., Tiwari G.. Assessment of interstate freight vehicle characteristics and impact of future emission and fuel economy standards on their emissions in India. Energy Policy. 2017;108:121–133. doi: 10.1016/j.enpol.2017.05.053. [DOI] [Google Scholar]
  4. Prohaska R.. et al. Heavy-Duty Vehicle Port Drayage Drive Cycle Characterization and Development’. SAE Int. J. Commer. Veh. 2016;9(2):331–338. doi: 10.4271/2016-01-8135. [DOI] [Google Scholar]
  5. International Institute for Applied Systems Analysis (IIASA) , Greenhouse Gas and Air Pollution Interactions and Synergies (GAINS), 2015. [online] Available at: https://iiasa.ac.at/models-tools-data/gains (accessed July 21, 2024).
  6. U.S. Environmental Protection Agency (EPA) . Fuel Economy Labeling of Motor Vehicles: Final Rule, 2006. EPA-420-R-06–017, Washington, D.C. [Google Scholar]
  7. Katreddi S., Thiruvengadam A.. Trip based modeling of fuel consumption in modern heavy-duty vehicles using artificial intelligence. Energies. 2021;14(24):8592. doi: 10.3390/en14248592. [DOI] [Google Scholar]
  8. European Commission . Heavy-duty vehicles. In Climate Action, 2019. Retrieved November 24, 2024, from https://climate.ec.europa.eu/eu-action/transport-decarbonisation/road-transport/heavy-duty-vehicles_en. [Google Scholar]
  9. McNeil W. H., Porzio J., Tong F., Harley R. A., Auffhammer M., Scown C. D.. Impact of truck electrification on air pollution disparities in the United States. Nature Sustainability. 2025;8:276. doi: 10.1038/s41893-025-01515-x. [DOI] [Google Scholar]
  10. Hricko A., Rowland G., Logan S., Taher A., Wilson J.. Global trade, local impacts: Lessons from California on health impacts and environmental justice concerns for residents living near freight rail yards. Int. J. Environ. Res. Publ. Health. 2014;11(2):1914–1941. doi: 10.3390/ijerph110201914. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lee H., Pham H. T., Kim C., Lee K.. A Study on emissions from drayage trucks in the port city-focusing on the port of Incheon. Sustainability (Switzerland) 2019;11(19):5358. doi: 10.3390/su11195358. [DOI] [Google Scholar]
  12. Li Z., Zhao P., He Z., Xiao Z.. Non-linear effects of CO2 emissions from road transport in port landside area. Transp. Res. Part D. 2024;132:104264. doi: 10.1016/j.trd.2024.104264. [DOI] [Google Scholar]
  13. U.S. EPA . Carbon Pollution from Transportation, 2023. Available at: https://www.epa.gov/transportation-air-pollution-and-climate-change/carbon-pollution-transportation (accessed June 27, 2024).
  14. Huang Y., Ng E. C. Y., Zhou J. L., Surawski N. C., Chan E. F. C., Hong G.. Eco-driving technology for sustainable road transport: A review. Renewable Sustainable Energy Rev. 2018;93:596–609. doi: 10.1016/j.rser.2018.05.030. [DOI] [Google Scholar]
  15. Zhang T., Jin T., Qi J., Liu S., Hu J., Wang Z., Li Z., Mao H., Xu X.. Influence of test cycle and fuel property on fuel consumption and exhaust emissions of a heavy-duty diesel engine. Energy. 2022;244:122705. doi: 10.1016/j.energy.2021.122705. [DOI] [Google Scholar]
  16. Jin Y., Sharifi A., Li Z., Chen S., Zeng S., Zhao S.. Carbon emission prediction models: A review. Sci. Total Environ. 2024;(927):172319. doi: 10.1016/j.scitotenv.2024.172319. [DOI] [PubMed] [Google Scholar]
  17. Tian Y., Ren X., Li K., Li X.. Carbon Dioxide Emission Forecast: A Review of Existing Models and Future Challenges. Sustainability (Basel) 2025;17(4):1471. doi: 10.3390/su17041471. [DOI] [Google Scholar]
  18. Chiu Y. J., Hu Y. C., Jiang P., Xie J., Ken Y. W.. A Multivariate Grey Prediction Model Using Neural Networks with Application to Carbon Dioxide Emissions Forecasting. Mathematical Problems in Engineering. 2020;2020:1. doi: 10.1155/2020/8829948. [DOI] [Google Scholar]
  19. Qiao Q., Eskandari H., Saadatmand H., Sahraei M. A.. An interpretable multi-stage forecasting framework for energy consumption and CO2 emissions for the transportation sector. Energy. 2024;286:129499. doi: 10.1016/j.energy.2023.129499. [DOI] [Google Scholar]
  20. Ye L., Xie N., Hu A.. A novel time-delay multivariate grey model for impact analysis of CO2 emissions from China’s transportation sectors. Applied Mathematical Modelling. 2021;91:493–507. doi: 10.1016/j.apm.2020.09.045. [DOI] [Google Scholar]
  21. Çinarer G., Yeşilyurt M. K., Aǧbulut Ü., Yilbaşi Z., Kiliç K. I.. Application of various machine learning algorithms in view of predicting the CO2 emissions in the transportation sector. Sci. Technol. Energy Transition (STET) 2024;79:15. doi: 10.2516/stet/2024014. [DOI] [Google Scholar]
  22. Nagendra S. M. S., Khare M.. Artificial neural network based line source models for vehicular exhaust emission predictions of an urban roadway. Transp. Res. Part D. 2004;9:199–208. doi: 10.1016/j.trd.2004.01.002. [DOI] [Google Scholar]
  23. Gurcan F.. Forecasting CO2 emissions of fuel vehicles for an ecological world using ensemble learning, machine learning, and deep learning models. PeerJ. Comput. Sci. 2024;10:e2234. doi: 10.7717/PEERJ-CS.2234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Alam G. M. I., Arfin Tanim S., Sarker S. K., Watanobe Y., Islam R., Mridha M. F., Nur K.. Deep learning model-based prediction of vehicle CO2 emissions with eXplainable AI integration for sustainable environment. Sci. Rep. 2025;15(1):3655. doi: 10.1038/s41598-025-87233-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Al-Nefaie A. H., Aldhyani T. H. H.. Predicting CO2 Emissions from Traffic Vehicles for Sustainable and Smart Environment Using a Deep Learning Model. Sustainability (Switzerland) 2023;15(9):7615. doi: 10.3390/su15097615. [DOI] [Google Scholar]
  26. International Council on Clean Transportation . Global Transportation Trends: Heavy-Duty Vehicles, 2018. Retrieved from https://theicct.org/publications/global-transportation-trends-hdvs.
  27. Kellner F.. Exploring the impact of traffic congestion on CO2 emissions in freight distribution networks. Logistics Research. 2016;9(1):1–15. doi: 10.1007/s12159-016-0148-5. [DOI] [Google Scholar]
  28. McKinnon A., Edwards J., Piecyk M., Palmer A.. Traffic congestion, reliability and logistical performance: A multi-sectoral assessment. International Journal of Logistics Research and Applications. 2009;12(5):331–345. doi: 10.1080/13675560903181519. [DOI] [Google Scholar]
  29. Li X., Ren A., Li Q.. Exploring Patterns of Transportation-Related CO2 Emissions Using Machine Learning Methods. Sustainability (Switzerland) 2022;14(8):4588. doi: 10.3390/su14084588. [DOI] [Google Scholar]
  30. Udoh J., Lu J., Xu Q.. Application of Machine Learning to Predict CO2 Emissions in Light-Duty Vehicles. Sensors. 2024;24(24):8219. doi: 10.3390/s24248219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Xu J., Saleh M., Hatzopoulou M.. A machine learning approach capturing the effects of driving behaviour and driver characteristics on trip-level emissions. Atmos. Environ. 2020;224:117311. doi: 10.1016/j.atmosenv.2020.117311. [DOI] [Google Scholar]
  32. Huang S., Xiao X., Guo H.. A novel method for carbon emission forecasting based on EKC hypothesis and nonlinear multivariate grey model: evidence from transportation sector. Environmental Science and Pollution Research. 2022;29(40):60687–60711. doi: 10.1007/s11356-022-20120-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Wei N., Zhang Q., Zhang Y., Jin J., Chang J., Yang Z., Ma C., Jia Z., Ren C., Wu L., Peng J., Mao H.. Super-learner model realizes the transient prediction of CO2 and NOx of diesel trucks: Model development, evaluation and interpretation. Environ. Int. 2022;158:106977. doi: 10.1016/j.envint.2021.106977. [DOI] [PubMed] [Google Scholar]
  34. Mądziel M.. Liquified Petroleum Gas-Fuelled Vehicle CO2 Emission Modelling Based on Portable Emission Measurement System, On-Board Diagnostics Data, and Gradient-Boosting Machine Learning. Energies. 2023;16(6):2754. doi: 10.3390/en16062754. [DOI] [Google Scholar]
  35. Mądziel M.. Predictive methods for CO2 emissions and energy use in vehicles at intersections. Sci. Rep. 2025;15(1):6463. doi: 10.1038/s41598-025-91300-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Chen Y., Calabrese R., Martin-Barragan B.. Interpretable machine learning for imbalanced credit scoring datasets. European Journal of Operational Research. 2024;312(1):357–372. doi: 10.1016/j.ejor.2023.06.036. [DOI] [Google Scholar]
  37. Greenwell B. M.. pdp: An R Package for Constructing Partial Dependence Plots. R Journal. 2017;9(1):421–436. doi: 10.32614/RJ-2017-016. [DOI] [Google Scholar]
  38. Ullah I., Liu K., Yamamoto T., al Mamlook R. E., Jamal A.. A comparative performance of machine learning algorithm to predict electric vehicles energy consumption: A path towards sustainability. Energy and Environment. 2022;33(8):1583–1612. doi: 10.1177/0958305X211044998. [DOI] [Google Scholar]
  39. Alfaseeh L., Tu R., Farooq B., Hatzopoulou M.. Greenhouse Gas Emission Prediction on Road Network using Deep Sequence Learning. Transp. Res. Part D. 2020:102593. doi: 10.1016/j.trd.2020.102593. [DOI] [Google Scholar]
  40. Jaikumar R., Shiva Nagendra S. M., Sivanandan R.. Modeling of real time exhaust emissions of passenger cars under heterogeneous traffic conditions. Atmospheric Pollution Research. 2017;8(1):80–88. doi: 10.1016/j.apr.2016.07.011. [DOI] [Google Scholar]
  41. Mohammadnazar A., Khattak Z. H., Khattak A. J.. Assessing driving behavior influence on fuel efficiency using machine-learning and drive-cycle simulations. Transp. Res. Part D. 2024;126:104025. doi: 10.1016/j.trd.2023.104025. [DOI] [Google Scholar]
  42. Aras S., Hanifi Van M.. An interpretable forecasting framework for energy consumption and CO2 emissions. Appl. Energy. 2022;328:120163. doi: 10.1016/j.apenergy.2022.120163. [DOI] [Google Scholar]
  43. Li M., Sun H., Huang Y., Chen H.. Shapley value: from cooperative game to explainable artificial intelligence. Auton. Intell. Syst. 2024;4(1):1. doi: 10.1007/s43684-023-00060-8. [DOI] [Google Scholar]
  44. Port Management Authority . Chennai Port Development and Growth Prospects. In Annual Report, 2023. Available at: https://www.portmanagementauthority.org/annual-report-2023 (accessed Nov 17, 2024). [Google Scholar]
  45. Chennai Port Authority. Home, n.d. Available at: https://www.chennaiport.gov.in/home (accessed: Aug 24. 2024).
  46. International Maritime Organization (IMO) . Annual Review of Maritime Transport, 2010. Retrieved from: https://www.imo.org/en/OurWork/Transport/Documents/Annual%20Review%20of%20Maritime%20Transport%202010.pdf (accessed July 5, 2024).
  47. Anttila P., Nummelin T., Väätäinen K., Laitila J., Ala-Ilomäki J., Kilpeläinen A.. Effect of vehicle properties and driving environment on fuel consumption and CO2 emissions of timber trucking based on data from fleet management system. Transp. Res. Interdiscipl. Perspect. 2022;15:100671. doi: 10.1016/j.trip.2022.100671. [DOI] [Google Scholar]
  48. AVL . User Manual AVL Ditest Gas 1000, 2015. https://www.avlditest.com/index.php/en/emt-mds-418.html (accessed September 14, 2024).
  49. Jaikumar R., Shiva Nagendra S. M., Sivanandan R.. Modal analysis of real-time, real world vehicular exhaust emissions under heterogeneous traffic conditions. Transportation Research Part D: Transport and Environment. 2017;54:397–409. doi: 10.1016/j.trd.2017.06.015. [DOI] [Google Scholar]
  50. U.S. Environmental Protection Agency (EPA). 40 CFR § 1065.655 Chemical balances of fuel, intake air, and exhaust, 2013. Available at: https://www.ecfr.gov/current/title-40/section-1065.655 (accessed Mar 3, 2024).
  51. Zhang S., Wu Y., Liu H., Huang R., Yang L., Li Z.. Real-world fuel consumption and CO2 emissions of urban public buses in Beijing. Appl. Energy. 2014;113:1645–1655. doi: 10.1016/j.apenergy.2013.09.017. [DOI] [Google Scholar]
  52. Wang, M. Q. GREET 1.5 - transportation fuel-cycle model - Vol. 1: methodology, development, use, and results: Argonne National Laboratory, 1996; p 218. http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=14775. [Google Scholar]
  53. U.S. Environmental Protection Agency (EPA) . Fuel economy, CO2 emissions, and carbon-related exhaust emissions calculations. In Code of Federal Regulations, Title 40, Section 600.113-12, 2012. Available at: https://www.ecfr.gov/current/title-40/chapter-I/subchapter-Q/part-600/subpart-B/section-600.113-12. [Google Scholar]
  54. Cutler D. R., Edwards T. C., Beard K. H., Cutler A., Hess K. T., Gibson J., Lawler J. J.. Random forests for classification in ecology. Ecology. 2007;88(11):2783–2792. doi: 10.1890/07-0539.1. [DOI] [PubMed] [Google Scholar]
  55. Chen, T. ; Guestrin, C. . XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13–17-August-2016, 2016; pp 785–794. [Google Scholar]
  56. Zhang C., Zou X., Lin C.. Fusing XGBoost and SHAP Models for Maritime Accident Prediction and Causality Interpretability Analysis. J. Mar. Sci. Eng. 2022;10(8):1154. doi: 10.3390/jmse10081154. [DOI] [Google Scholar]
  57. Houdou A., el Badisy I., Khomsi K., Abdala S. A., Abdulla F., Najmi H., Obtel M., Belyamani L., Ibrahimi A., Khalis M.. Interpretable Machine Learning Approaches for Forecasting and Predicting Air Pollution: A Systematic Review. AAGR Aerosol Air Qual. Res. 2024;24(1):230151. doi: 10.4209/aaqr.230151. [DOI] [Google Scholar]
  58. Sharma E., Deo R. C., Prasad R., Parisi A. v.. A hybrid air quality early-warning framework: An hourly forecasting model with online sequential extreme learning machines and empirical mode decomposition algorithms. Sci. Total Environ. 2020;709:135934. doi: 10.1016/j.scitotenv.2019.135934. [DOI] [PubMed] [Google Scholar]
  59. Singh Kushwah J., Kumar A., Patel S., Soni R., Gawande A., Gupta S.. Comparative study of regressor and classifier with decision tree using modern tools. Materials Today: Proceedings. 2022;56:3571–3576. doi: 10.1016/j.matpr.2021.11.635. [DOI] [Google Scholar]
  60. Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (3rd ed.), 2025. christophm.github.io/interpretable-ml-book/.
  61. Bisong, E. Google colaboratory: Building machine learning and deep learning models on Google Cloud Platform: A comprehensive guide for beginners, 2019; pp 59–64. [Google Scholar]
  62. Yang C., Chen M., Yuan Q.. The application of XGBoost and SHAP to examining the factors in freight truck-related crashes: An exploratory analysis. Accid. Anal. Prev. 2021;158:106153. doi: 10.1016/j.aap.2021.106153. [DOI] [PubMed] [Google Scholar]
  63. Shapley, L. A Value for n-Person Games. In: Kuhn, H. ; Tucker, A. , Eds., Contributions to the Theory of Games II; Princeton University Press: Princeton, 1953; pp 307–317. [Google Scholar]
  64. Lundberg, S. ; Lee, S.-I. . A Unified Approach to Interpreting Model Predictions, 2017. http://arxiv.org/abs/1705.07874.
  65. Wang X., Song G., Zhai Z., Wu Y., Yin H., Yu L.. Effects of vehicle load on emissions of heavy-duty diesel trucks: A study based on real-world data. Int. J. Environ. Res. Publ. Health. 2021;18(8):3877. doi: 10.3390/ijerph18083877. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Quiros D. C., Smith J., Thiruvengadam A., Huai T., Hu S.. Greenhouse gas emissions from heavy-duty natural gas, hybrid, and conventional diesel on-road trucks during freight transport. Atmos. Environ. 2017;168:36–45. doi: 10.1016/j.atmosenv.2017.08.066. [DOI] [Google Scholar]
  67. Zhou Z., Qiu C., Zhang Y.. A comparative analysis of linear regression, neural networks and random forest regression for predicting air ozone employing soft sensor models. Sci. Rep. 2023;13(1):22420. doi: 10.1038/s41598-023-49899-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Deepika, Pandove G.. Prediction of traffic time using XGBoost model with hyperparameter optimization. Multimedia Tools Appl. 2025:37045. doi: 10.1007/s11042-025-20646-z. [DOI] [Google Scholar]
  69. Ranjan S., Saragur Madanayak S. N.. Exploratory real-world emission modal assessment of three-wheeled autorickshaws in urban road networks. Transportation Research Part D: Transport and Environment. 2025;148:105006. doi: 10.1016/j.trd.2025.105006. [DOI] [Google Scholar]
  70. Suarez J., Makridis M., Anesiadou A., Komnos D., Ciuffo B., Fontaras G.. Benchmarking the driver acceleration impact on vehicle energy consumption and CO2 emissions. Transp. Res. Part D. 2022;107:103282. doi: 10.1016/j.trd.2022.103282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Suarez-Bertoa R., Valverde V., Clairotte M., Pavlovic J., Giechaskiel B., Franco V., Kregar Z., Astorga C.. On-road emissions of passenger cars beyond the boundary conditions of the real-driving emissions test. Environ. Res. 2019;176:108572. doi: 10.1016/j.envres.2019.108572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Zachariadis T., Ntziachristos L., Samaras Z.. The effect of age and technological change on motor vehicle emissions. Transportation Research Part D: Transport and Environment. 2001;6:221–227. doi: 10.1016/S1361-9209(00)00025-0. [DOI] [Google Scholar]
  73. Bishop G. A., Stedman D. H.. A decade of on-road emissions measurements. Environ. Sci. Technol. 2008;42(5):1651–1656. doi: 10.1021/es702413b. [DOI] [PubMed] [Google Scholar]
  74. Buberger J., Kersten A., Kuder M., Eckerle R., Weyh T., Thiringer T.. Total CO2-equivalent life-cycle emissions from commercially available passenger cars. Renewable Sustainable Energy Rev. 2022;159:112158. doi: 10.1016/j.rser.2022.112158. [DOI] [Google Scholar]
  75. Scavuzzo C. M., Scavuzzo J. M., Campero M. N., Anegagrie M., Aramendia A. A., Benito A., Periago V.. Feature importance: Opening a soil-transmitted helminth machine learning model via SHAP. Infectious Disease Modelling. 2022;7(1):262–276. doi: 10.1016/j.idm.2022.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Weller K., Lipp S., Röck M., Matzer C., Bittermann A., Hausberger S.. Real World Fuel Consumption and Emissions from LDVs and HDVs. Frontiers in Mechanical Engineering. 2019;5(July):1–22. doi: 10.3389/fmech.2019.00045. [DOI] [Google Scholar]
  77. Hesamian G., Torkian F., Johannssen A., Chukhrova N.. A learning system-based soft multiple linear regression model. Intell. Syst. Appl. 2024;22:200378. doi: 10.1016/j.iswa.2024.200378. [DOI] [Google Scholar]
  78. Ma J., Yu Z., Qu Y., Xu J., Cao Y.. Application of the xgboost machine learning method in pm2.5 prediction: A case study of shanghai. Aerosol and Air Quality Research. 2020;20(1):128–138. doi: 10.4209/aaqr.2019.08.0408. [DOI] [Google Scholar]
  79. Naghibi S. A., Hashemi H., Berndtsson R., Lee S.. Application of extreme gradient boosting and parallel random forest algorithms for assessing groundwater spring potential using DEM-derived factors. J. Hydrol. 2020;589:125197. doi: 10.1016/j.jhydrol.2020.125197. [DOI] [Google Scholar]
  80. Rosero F., Fonseca N., Mera Z., López J.. Assessing on-road emissions from urban buses in different traffic congestion scenarios by integrating real-world driving, traf fi c, and emissions data. Sci. Total Environ. 2023;863:161002. doi: 10.1016/j.scitotenv.2022.161002. [DOI] [PubMed] [Google Scholar]
  81. European Parliament and Council of the European Union . Regulation (EU) 2019/1242 setting CO2 emission performance standards for new heavy-duty vehicles and amending Regulations (EC) No 595/2009 and (EU) 2018/956 and Council Directive 96/53/EC (Text with EEA relevance). Off. J. Eur. Union, L, 2019, 198; pp 202–240. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32019R1242 (accessed Mar 12, 2025). [Google Scholar]
  82. MoEFCC . National Clean Air Programme (NCAP). Government of India, 2019. Retrieved from: https://energyandcleanair.org/wp/wp-content/uploads/2024/01/Tracing-the-Hazy-Air-2024-Progress-Report-on-National-Clean-Air-Programme-NCAP.pdf (accessed April 11, 2024).
  83. Ministry of Ports, Shipping & Waterways (MoPSW) . Harit Sagar – Green Port Guidelines; Government of India, 2023. Retrieved July 15, 2024, from https://shipmin.gov.in/sites/default/files/Harit%20Sagar%20-%20Green%20Port%20Guidelines%20.pdf. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao5c12136_si_001.pdf (1.9MB, pdf)

Data Availability Statement

Data including the code repository for ML in python scripts are available in the model source file at 10.5281/zenodo.17043313


Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES