Skip to main content
Data in Brief logoLink to Data in Brief
. 2024 May 3;54:110491. doi: 10.1016/j.dib.2024.110491

Application of extreme learning machine (ELM) forecasting model on CO2 emission dataset of a natural gas-fired power plant in Dhaka, Bangladesh

Mustafizur Rahman a,b,, Faijunnesa Rashid c, Sujit Kumar Roy d, Md Ahosan Habib a
PMCID: PMC11106830  PMID: 38774245

Abstract

Understanding and predicting CO2 emissions from individual power plants is crucial for developing effective mitigation strategies. This study analyzes and forecasts CO2 emissions from an engine-based natural gas-fired power plant in Dhaka Export Processing Zone (DEPZ), Bangladesh. This study also presents a rich dataset and ELM-based prediction model for a natural gas-fired plant in Bangladesh. Utilizing a rich dataset of Electricity generation and Gas Consumption, CO2 emissions in tons are estimated based on the measured energy use, and the ELM models were trained on CO2 emissions data from January 2015 to December 2022 and used to forecast CO2 emissions until December 2026. This study aims to improve the understanding and prediction of CO2 emissions from natural gas-fired power plants. While the specific operational strategy of the studied plant is not available, the provided data can serve as a valuable baseline or benchmark for comparison with similar facilities and the development of future research on optimizing operations and CO2 mitigation strategies. The Extreme Learning Machine (ELM) modeling method was employed due to its efficiency and accuracy in prediction. The ELM models achieved performance metrics Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Scaled Error (MASE), values respectively 3494.46 (<5000), 2013.42 (<2500), and 0.93 close to 1, which falls within the acceptable range. Although natural gas is a cleaner alternative, emission reduction remains essential. This data-driven approach using a Bangladeshi case study provides a replicable framework for optimizing plant operations and measuring and forecasting CO2 emissions from similar facilities, contributing to global climate change.

Keywords: Air pollutants, CO2 emission from natural gas-fired power plant, Emission forecasting, Application of ELM model


Specifications Table

Subject Pollution
Specific subject area CO2 emissions estimations from engine Natural gas-fired power plant. Application of a machine learning model named “ELM” and forecasting emissions data.
How data were acquired Fuel consumption and electricity generation data were acquired from monthly production reports, and CO2 emissions were calculated using fuel consumption data.
Data format Raw and Analyzed
Type of data Tabular
Data collection Data collected on fuel consumption and electricity generation, and then used the fuel consumption data in a greenhouse gas protocols calculator to calculate CO2 emissions. CO2 emissions are calculated, according to the method of the IPCC (2006), with the following equation (i).
Ec=dB×eF (i)
here,
Ec= emissions of CO2/CO2eq (kg);
B = estimated fuel consumption (TJ);
eF = emission factor (kg/TJ);
d = type of fuel (e.g. residual fuel oil, diesel, natural gas, LPG, etc.).
Data source location A natural gas-fired power plant with 86MW electricity generation capacity at Dhaka Export Processing Zone Authority (DEPZ), Bangladesh. (Geographic coordinate- 23.947130, 90.283835)
Data accessibility Raw data can be retrieved from the Mendeley repository- Rahman, Mustafizur; Rashid, Faijunnesa (2023), “Electricity generation, natural gas consumption and CO2 emission data of a power plant in Dhaka, Bangladesh”, Mendeley Data, V4, doi: 10.17632/63pxv64h75.4
https://data.mendeley.com/datasets/63pxv64h75/4

1. Value of the Data

  • These data are useful because they quantitatively estimate CO2 emissions from engine-based natural gas-fired power plants based on measured energy use rather than directly measuring the CO2 itself.

  • The data can be used to identify trends in CO2 emissions from the power plant over time, which can help inform decisions about future energy policies and regulations.

  • This data will be useful in comparing CO2 emissions from this power plant with those of other similar plants, identifying best practices and potential areas for improvement. For example, the data could be used to assess the effectiveness of different emission control technologies or operational strategies used by other plants.

  • The data can be used to raise awareness about CO2 emissions from engine-based natural gas-fired power plants and the need for cleaner energy sources. It could also inspire innovations in power plant technology that further reduce CO2 emissions.

2. Data Description

This article describes a dataset of linked Mendeley data that was collected in Bangladesh over an 86MW combined cycle, natural gas-fired power plant located in Dhaka Export Processing Zone Authority (DEPZ), Bangladesh. The data regarding Electricity generation (Megawatts, MW), fuel consumption (cubic meters, m3), and fuel type (Natural Gas) were collected directly from the plant's internal production reports with permission and collected data covered the period from January 2015 to December 2022. The full production reports are withheld due to confidentiality concerns, as they contain sensitive information such as fuel pricing, customer contracts, employee details, and compliance data. However, The data was then processed following IPCC guidelines to estimate CO2 emissions (tons) based on fuel consumption using the Greenhouse Gas Protocols emission calculator (https://ghgprotocol.org/calculation-tools-and-guidance). A Microsoft Excel sheet was prepared using measured CO2 emissions (tons) data and analyzed and forecasted up to 2026 using the software RStudio, version 2023.09.1+494.

3. Experimental Design, Materials, and Methods

The ELM is a powerful training approach designed for various feedforward neural network architectures, including single-layer configurations. Instead of iteratively adjusting the weights between the input and hidden layers, ELM employs a unique strategy. It randomly assigns the values for the input-to-hidden weights and the hidden layer biases initially. Subsequently, ELM leverages the classical Moore–Penrose generalized inverse to analytically determine the optimal weights connecting the hidden layer to the output layer, significantly reducing the computational complexity compared to traditional backpropagation-based training methods [[1], [2], [3]]. Unlike conventional learning techniques for single-layer feedforward neural networks (SLFNs), the ELM significantly reduces the number of computational parameters required. By employing its unique approach, ELM offers several advantages over traditional gradient-based methods. Notably, it achieves faster execution speeds, requires fewer learnable parameters, and exhibits stronger generalization capabilities. These merits stem from ELM's ability to analytically determine the output weights, bypassing the need for iterative weight updates, which can be computationally expensive and prone to overfitting [4]. To perform this study, the "elm" and "forecast" default functions from the "nnfor" package in R programming were utilized. The architecture of the ELM is remarkably simple and straightforward, consisting of three distinct segments: the Input layer, the Hidden Layer, and the Output Layer, as depicted in Fig. 1. The mathematical formulation of the ELM model is concisely presented through the following equations:

Fig. 1.

Fig 1

The ELM Model Architecture

3.1. Input layer

In ELM, the Input Layer is where the data enters the model. It's represented as a vector called X, which contains the input features.

X=X1,X2,X3,...,X[N] (ii)

In this representation, each X[i] corresponds to a specific feature or attribute of the data. N is the total number of features. The Input Layer is responsible for passing the data to the Hidden Layer for further processing.

3.2. Hidden Layer

The output of the hidden layer, often denoted as H, is calculated by applying an activation function g similar to element-wise operations in linear regression. This function is applied to a linear combination of the input features from the previous layer and their corresponding weights, with an added bias term.

H=g(W×X+b) (iii)

3.3. Output layer

In ELM, the output layer weights are calculated using the Moore-Penrose inverse of the hidden layer output matrix. This output weight matrix is denoted as β. The output predictions, represented as f(x), are calculated by multiplying the hidden layer output H by the output weights beta:

f(x)=H×β (iv)

In this study, an ELM model was developed for time series forecasting. The ELM architecture consisted of a single hidden layer with 1 hidden layer (hd = 1). The output layer weights (β) were estimated separately for each of the 20 repetitions of the training process (length (β) = 20). Similarly, the output node biases (b) were computed for each of the 20 training repetitions (length(b) = 20). The input data was preprocessed by incorporating 4 lagged values (lags = 4) to capture the temporal dependencies. No exogenous variables were used, as indicated by the null values for xreg.lags and xreg.minmax. The time series exhibited seasonal patterns, which were accommodated using deterministic seasonal dummies (sdummy = 1). While no seasonal frequencies were explicitly coded (ff.det = 0), the deterministic seasonality type was set to 1 (det.type = '1′). The input time series consisted of 96 observations. The ELM model was trained on the provided time series data, and the predicted values were obtained for 84 time points. The structure of ELM model is presented in Fig. 1.

The linear ELM model is applied to CO2 emissions from 2015 to 2022 to forecast CO2 emissions up to 2026 (Fig. 2), and Rstudio software is used to visualize and predict future emissions using historical data following the 20 trained models. The x-axis represents the time from 2015 to 2026, and the y-axis represents CO2 emissions in tons. The black line represents historical CO2 emissions data from 2015 to 2022, and the bold blue line represents the forecasted values of CO2 emissions. Post 2022, multiple grey lines indicate forecasts that diverge as time progresses, showing a range of possible future values. This approach allows for a more comprehensive understanding of uncertainty and variability in the ELM model's predictions. Each grey line represents a different realization of the model, considering various sources of uncertainty. By examining this ensemble, researchers can assess the likelihood of different outcomes and make more informed decisions. The graph shows a decrease in CO2 emissions over time, with a sharp decrease starting in 2022. The ELM model can only "learn" from the data it's trained on, and if the training data does not capture all the real-world factors influencing CO2 emissions, the forecast may not perfectly reflect reality [5].

Fig. 2.

Fig 2

CO2 emission over time (2015-2026).

Lower RMSE, MAE values below 5000 and 2500, and MASE values closer to 1 are considered acceptable for accurate forecasting and also indicate that the model has better accuracy in predicting future values [6]. A lower MASE value close to 1 indicates good performance of the ELM model forecast. The applied ELM model on the CO2 emission dataset has RMSE value of 3494.46, MAE value of 2013.42, and MASE value of 0.93. These values indicate the model is well-fitted with the provided data, and the model can be considered a robust model for the prediction of CO2 emission for this type of industry.

The data shows (Table 1) monthly forecasted CO2 emissions (tons) from the engine-based natural gas-fired power plant from 2023 to 2026. The average monthly CO2 emissions across the four years is 9832.5. The highest forecasted CO2 emissions were recorded in January 2023 at 12159.10, and the lowest were recorded in December 2026 at 7485.34. Fig. 3 represents the monthly CO2 emission anomalies over a period of four years. In addition, this graph is used to show changes from the mean value of the forecasted CO2 emissions, indicating how much emissions will be increased or decreased.

Table 1.

Forecasted CO2 emissions (tons) from the engine-based natural gas-fired power plant.

Month 2023 2024 2025 2026 Mean Max Min STD
January 12159.1 10984.4 9784.78 8585.07 10378.35 12159.14 8585.067 1539.126
February 12064.3 10884.4 9684.8 8485.09 10279.66 12064.31 8485.092 1541.109
March 11966.1 10784.8 9584.81 8385.12 10180.19 11966.05 8385.117 1541.815
April 11857.9 10685.2 9484.84 8285.14 10078.28 11857.91 8285.141 1538.72
May 11746.8 10585 9384.88 8185.17 9975.456 11746.78 8185.166 1534.39
June 11659.8 10484.2 9284.9 8085.19 9878.534 11659.83 8085.19 1539.302
July 11590.4 10384.3 9184.92 7985.22 9786.212 11590.36 7985.215 1551.113
August 11495.6 10284.4 9084.95 7885.24 9687.538 11495.57 7885.24 1553.125
September 11400.8 10184.4 8984.97 7785.26 9588.861 11400.77 7785.264 1555.139
October 11297 10084.7 8884.99 7685.29 9488.005 11297.03 7685.289 1553.709
November 11184.5 9984.75 8785.02 7585.31 9384.896 11184.51 7585.313 1548.847
December 11083.8 9884.76 8685.04 7485.34 9284.726 11083.77 7485.338 1548.549
Mean 11625.5 10434.62 9234.908 8035.203
Max 12159.14 10984.41 9784.78 8585.067
Min 11083.77 9884.755 8685.042 7485.338
STD 347.4143 360.5472 360.4661 360.4664

Fig. 3.

Fig 3

Monthly CO2 Emission Anomaly.

Limitations

This data article focuses on CO2 emissions data from a single natural gas-fired power plant over a specific timeframe. The study considers only a factor influencing CO2 emissions fuel consumption; other relevant factors like plant running hours, fuel quality, or maintenance practices might not consider. Predicting future CO2 emissions using a linear ELM model inherently involves uncertainty due to unforeseen changes in plant operations, fuel characteristics, or environmental conditions.

Ethics Statement

This dataset does not involve human subjects, animal experiments, or data collected from social media platforms.

CRediT authorship contribution statement

Mustafizur Rahman: Conceptualization, Methodology, Software, Formal analysis, Writing – original draft, Visualization, Supervision. Faijunnesa Rashid: Data curation, Writing – original draft, Formal analysis. Sujit Kumar Roy: . Md. Ahosan Habib: Writing – review & editing.

Acknowledgments

Acknowledgments

We would like to express our sincere gratitude to Kamrul Hasan from Gazi University, Turkey, for his valuable feedback and support in reviewing our research article. His insightful comments and suggestions have helped us improve the quality of our work. We also thank the anonymous reviewers for their constructive feedback and suggestions. Finally, we would like to acknowledge the support of our colleagues and friends who have contributed to this data article in various ways.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

References

  • 1.G.-B. Huang, X. Ding, H. Zhou, Q.-Y. Zhu, L. Lekamalage, C. Kasun, H. Zhou, G.-B. Huang, C.M. Vong, G. Hinton, P. Vincent, Extreme learning machine for regression and multiclass classification, 2006. www.computer.org/intelligent.
  • 2.Bin Huang G., Zhu Q.Y., Mao K.Z., Siew C.K., Saratchandran P., Sundararajan N. Can threshold networks be trained directly?, ieee transactions on circuits and systems II. Express Briefs. 2006;53:187–191. doi: 10.1109/TCSII.2005.857540. [DOI] [Google Scholar]
  • 3.Huang G., Song S., Gupta J.N.D., Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 2014;44:2405–2417. doi: 10.1109/TCYB.2014.2307349. [DOI] [PubMed] [Google Scholar]
  • 4.Yadav B., Ch S., Mathur S., Adamowski J. Estimation of in-situ bioremediation system cost using a hybrid extreme learning machine (ELM)-particle swarm optimization approach. J. Hydrol. (Amst) 2016;543:373–385. doi: 10.1016/j.jhydrol.2016.10.013. [DOI] [Google Scholar]
  • 5.Guo X., Yang J., Shen Y., Zhang X. Prediction of agricultural carbon emissions in China based on a GA-ELM model. Front Energy Res. 2023;11 doi: 10.3389/fenrg.2023.1245820. [DOI] [Google Scholar]
  • 6.Ibe F.C., Opara A.I., Duru C.E., Obinna I.B., Enedoh M.C. Statistical analysis of atmospheric pollutant concentrations in parts of Imo State, Southeastern Nigeria. Sci. Afr. 2020;7 doi: 10.1016/j.sciaf.2019.e00237. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES