Application of extreme learning machine (ELM) forecasting model on CO2 emission dataset of a natural gas-fired power plant in Dhaka, Bangladesh

Mustafizur Rahman; Faijunnesa Rashid; Sujit Kumar Roy; Md Ahosan Habib

doi:10.1016/j.dib.2024.110491

. 2024 May 3;54:110491. doi: 10.1016/j.dib.2024.110491

Application of extreme learning machine (ELM) forecasting model on CO₂ emission dataset of a natural gas-fired power plant in Dhaka, Bangladesh

Mustafizur Rahman ^a,^b,^⁎, Faijunnesa Rashid ^c, Sujit Kumar Roy ^d, Md Ahosan Habib ^a

PMCID: PMC11106830 PMID: 38774245

Abstract

Understanding and predicting CO₂ emissions from individual power plants is crucial for developing effective mitigation strategies. This study analyzes and forecasts CO₂ emissions from an engine-based natural gas-fired power plant in Dhaka Export Processing Zone (DEPZ), Bangladesh. This study also presents a rich dataset and ELM-based prediction model for a natural gas-fired plant in Bangladesh. Utilizing a rich dataset of Electricity generation and Gas Consumption, CO₂ emissions in tons are estimated based on the measured energy use, and the ELM models were trained on CO₂ emissions data from January 2015 to December 2022 and used to forecast CO₂ emissions until December 2026. This study aims to improve the understanding and prediction of CO₂ emissions from natural gas-fired power plants. While the specific operational strategy of the studied plant is not available, the provided data can serve as a valuable baseline or benchmark for comparison with similar facilities and the development of future research on optimizing operations and CO₂ mitigation strategies. The Extreme Learning Machine (ELM) modeling method was employed due to its efficiency and accuracy in prediction. The ELM models achieved performance metrics Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Scaled Error (MASE), values respectively 3494.46 (<5000), 2013.42 (<2500), and 0.93 close to 1, which falls within the acceptable range. Although natural gas is a cleaner alternative, emission reduction remains essential. This data-driven approach using a Bangladeshi case study provides a replicable framework for optimizing plant operations and measuring and forecasting CO₂ emissions from similar facilities, contributing to global climate change.

Keywords: Air pollutants, CO₂ emission from natural gas-fired power plant, Emission forecasting, Application of ELM model

Specifications Table

Subject	Pollution
Specific subject area	CO₂ emissions estimations from engine Natural gas-fired power plant. Application of a machine learning model named “ELM” and forecasting emissions data.
How data were acquired	Fuel consumption and electricity generation data were acquired from monthly production reports, and CO₂ emissions were calculated using fuel consumption data.
Data format	Raw and Analyzed
Type of data	Tabular
Data collection	Data collected on fuel consumption and electricity generation, and then used the fuel consumption data in a greenhouse gas protocols calculator to calculate CO₂ emissions. CO₂ emissions are calculated, according to the method of the IPCC (2006), with the following equation (i). $E_{c} = \sum_{d} B \times e F$ (i) here, $E_{c}$ = emissions of CO₂/CO₂eq (kg); $B$ = estimated fuel consumption (TJ); $e F$ = emission factor (kg/TJ); $d$ = type of fuel (e.g. residual fuel oil, diesel, natural gas, LPG, etc.).
Data source location	A natural gas-fired power plant with 86MW electricity generation capacity at Dhaka Export Processing Zone Authority (DEPZ), Bangladesh. (Geographic coordinate- 23.947130, 90.283835)
Data accessibility	Raw data can be retrieved from the Mendeley repository- Rahman, Mustafizur; Rashid, Faijunnesa (2023), “Electricity generation, natural gas consumption and CO₂ emission data of a power plant in Dhaka, Bangladesh”, Mendeley Data, V4, doi: 10.17632/63pxv64h75.4 https://data.mendeley.com/datasets/63pxv64h75/4

Open in a new tab

1. Value of the Data

•
These data are useful because they quantitatively estimate CO₂ emissions from engine-based natural gas-fired power plants based on measured energy use rather than directly measuring the CO₂ itself.
•
The data can be used to identify trends in CO₂ emissions from the power plant over time, which can help inform decisions about future energy policies and regulations.
•
This data will be useful in comparing CO₂ emissions from this power plant with those of other similar plants, identifying best practices and potential areas for improvement. For example, the data could be used to assess the effectiveness of different emission control technologies or operational strategies used by other plants.
•
The data can be used to raise awareness about CO₂ emissions from engine-based natural gas-fired power plants and the need for cleaner energy sources. It could also inspire innovations in power plant technology that further reduce CO₂ emissions.

2. Data Description

This article describes a dataset of linked Mendeley data that was collected in Bangladesh over an 86MW combined cycle, natural gas-fired power plant located in Dhaka Export Processing Zone Authority (DEPZ), Bangladesh. The data regarding Electricity generation (Megawatts, MW), fuel consumption (cubic meters, m³), and fuel type (Natural Gas) were collected directly from the plant's internal production reports with permission and collected data covered the period from January 2015 to December 2022. The full production reports are withheld due to confidentiality concerns, as they contain sensitive information such as fuel pricing, customer contracts, employee details, and compliance data. However, The data was then processed following IPCC guidelines to estimate CO₂ emissions (tons) based on fuel consumption using the Greenhouse Gas Protocols emission calculator (https://ghgprotocol.org/calculation-tools-and-guidance). A Microsoft Excel sheet was prepared using measured CO₂ emissions (tons) data and analyzed and forecasted up to 2026 using the software RStudio, version 2023.09.1+494.

3. Experimental Design, Materials, and Methods

The ELM is a powerful training approach designed for various feedforward neural network architectures, including single-layer configurations. Instead of iteratively adjusting the weights between the input and hidden layers, ELM employs a unique strategy. It randomly assigns the values for the input-to-hidden weights and the hidden layer biases initially. Subsequently, ELM leverages the classical Moore–Penrose generalized inverse to analytically determine the optimal weights connecting the hidden layer to the output layer, significantly reducing the computational complexity compared to traditional backpropagation-based training methods [[1], [2], [3]]. Unlike conventional learning techniques for single-layer feedforward neural networks (SLFNs), the ELM significantly reduces the number of computational parameters required. By employing its unique approach, ELM offers several advantages over traditional gradient-based methods. Notably, it achieves faster execution speeds, requires fewer learnable parameters, and exhibits stronger generalization capabilities. These merits stem from ELM's ability to analytically determine the output weights, bypassing the need for iterative weight updates, which can be computationally expensive and prone to overfitting [4]. To perform this study, the "elm" and "forecast" default functions from the "nnfor" package in R programming were utilized. The architecture of the ELM is remarkably simple and straightforward, consisting of three distinct segments: the Input layer, the Hidden Layer, and the Output Layer, as depicted in Fig. 1. The mathematical formulation of the ELM model is concisely presented through the following equations:

3.1. Input layer

In ELM, the Input Layer is where the data enters the model. It's represented as a vector called X, which contains the input features.

X = [X[1], X[2], X[3], . . ., X [N]]

(ii)

In this representation, each X[i] corresponds to a specific feature or attribute of the data. N is the total number of features. The Input Layer is responsible for passing the data to the Hidden Layer for further processing.

3.2. Hidden Layer

The output of the hidden layer, often denoted as H, is calculated by applying an activation function g similar to element-wise operations in linear regression. This function is applied to a linear combination of the input features from the previous layer and their corresponding weights, with an added bias term.

H = g (W \times X + b)

(iii)

3.3. Output layer

In ELM, the output layer weights are calculated using the Moore-Penrose inverse of the hidden layer output matrix. This output weight matrix is denoted as β. The output predictions, represented as f(x), are calculated by multiplying the hidden layer output H by the output weights beta:

f (x) = H \times β

(iv)

In this study, an ELM model was developed for time series forecasting. The ELM architecture consisted of a single hidden layer with 1 hidden layer (hd = 1). The output layer weights (β) were estimated separately for each of the 20 repetitions of the training process (length (β) = 20). Similarly, the output node biases (b) were computed for each of the 20 training repetitions (length(b) = 20). The input data was preprocessed by incorporating 4 lagged values (lags = 4) to capture the temporal dependencies. No exogenous variables were used, as indicated by the null values for xreg.lags and xreg.minmax. The time series exhibited seasonal patterns, which were accommodated using deterministic seasonal dummies (sdummy = 1). While no seasonal frequencies were explicitly coded (ff.det = 0), the deterministic seasonality type was set to 1 (det.type = '1′). The input time series consisted of 96 observations. The ELM model was trained on the provided time series data, and the predicted values were obtained for 84 time points. The structure of ELM model is presented in Fig. 1.

The linear ELM model is applied to CO₂ emissions from 2015 to 2022 to forecast CO₂ emissions up to 2026 (Fig. 2), and Rstudio software is used to visualize and predict future emissions using historical data following the 20 trained models. The x-axis represents the time from 2015 to 2026, and the y-axis represents CO₂ emissions in tons. The black line represents historical CO₂ emissions data from 2015 to 2022, and the bold blue line represents the forecasted values of CO₂ emissions. Post 2022, multiple grey lines indicate forecasts that diverge as time progresses, showing a range of possible future values. This approach allows for a more comprehensive understanding of uncertainty and variability in the ELM model's predictions. Each grey line represents a different realization of the model, considering various sources of uncertainty. By examining this ensemble, researchers can assess the likelihood of different outcomes and make more informed decisions. The graph shows a decrease in CO₂ emissions over time, with a sharp decrease starting in 2022. The ELM model can only "learn" from the data it's trained on, and if the training data does not capture all the real-world factors influencing CO₂ emissions, the forecast may not perfectly reflect reality [5].

Lower RMSE, MAE values below 5000 and 2500, and MASE values closer to 1 are considered acceptable for accurate forecasting and also indicate that the model has better accuracy in predicting future values [6]. A lower MASE value close to 1 indicates good performance of the ELM model forecast. The applied ELM model on the CO₂ emission dataset has RMSE value of 3494.46, MAE value of 2013.42, and MASE value of 0.93. These values indicate the model is well-fitted with the provided data, and the model can be considered a robust model for the prediction of CO₂ emission for this type of industry.

The data shows (Table 1) monthly forecasted CO₂ emissions (tons) from the engine-based natural gas-fired power plant from 2023 to 2026. The average monthly CO₂ emissions across the four years is 9832.5. The highest forecasted CO₂ emissions were recorded in January 2023 at 12159.10, and the lowest were recorded in December 2026 at 7485.34. Fig. 3 represents the monthly CO₂ emission anomalies over a period of four years. In addition, this graph is used to show changes from the mean value of the forecasted CO₂ emissions, indicating how much emissions will be increased or decreased.

Table 1.

Forecasted CO₂ emissions (tons) from the engine-based natural gas-fired power plant.

Month	2023	2024	2025	2026	Mean	Max	Min	STD
January	12159.1	10984.4	9784.78	8585.07	10378.35	12159.14	8585.067	1539.126
February	12064.3	10884.4	9684.8	8485.09	10279.66	12064.31	8485.092	1541.109
March	11966.1	10784.8	9584.81	8385.12	10180.19	11966.05	8385.117	1541.815
April	11857.9	10685.2	9484.84	8285.14	10078.28	11857.91	8285.141	1538.72
May	11746.8	10585	9384.88	8185.17	9975.456	11746.78	8185.166	1534.39
June	11659.8	10484.2	9284.9	8085.19	9878.534	11659.83	8085.19	1539.302
July	11590.4	10384.3	9184.92	7985.22	9786.212	11590.36	7985.215	1551.113
August	11495.6	10284.4	9084.95	7885.24	9687.538	11495.57	7885.24	1553.125
September	11400.8	10184.4	8984.97	7785.26	9588.861	11400.77	7785.264	1555.139
October	11297	10084.7	8884.99	7685.29	9488.005	11297.03	7685.289	1553.709
November	11184.5	9984.75	8785.02	7585.31	9384.896	11184.51	7585.313	1548.847
December	11083.8	9884.76	8685.04	7485.34	9284.726	11083.77	7485.338	1548.549
Mean	11625.5	10434.62	9234.908	8035.203
Max	12159.14	10984.41	9784.78	8585.067
Min	11083.77	9884.755	8685.042	7485.338
STD	347.4143	360.5472	360.4661	360.4664

Open in a new tab

Limitations

This data article focuses on CO₂ emissions data from a single natural gas-fired power plant over a specific timeframe. The study considers only a factor influencing CO₂ emissions fuel consumption; other relevant factors like plant running hours, fuel quality, or maintenance practices might not consider. Predicting future CO₂ emissions using a linear ELM model inherently involves uncertainty due to unforeseen changes in plant operations, fuel characteristics, or environmental conditions.

Ethics Statement

This dataset does not involve human subjects, animal experiments, or data collected from social media platforms.

CRediT authorship contribution statement

Mustafizur Rahman: Conceptualization, Methodology, Software, Formal analysis, Writing – original draft, Visualization, Supervision. Faijunnesa Rashid: Data curation, Writing – original draft, Formal analysis. Sujit Kumar Roy: . Md. Ahosan Habib: Writing – review & editing.

Acknowledgments

We would like to express our sincere gratitude to Kamrul Hasan from Gazi University, Turkey, for his valuable feedback and support in reviewing our research article. His insightful comments and suggestions have helped us improve the quality of our work. We also thank the anonymous reviewers for their constructive feedback and suggestions. Finally, we would like to acknowledge the support of our colleagues and friends who have contributed to this data article in various ways.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

Electricity generation, natural gas consumption and CO2 emission data of a power plant in Dhaka, Bangladesh (Original data) (Mendeley Data).

References

1.G.-B. Huang, X. Ding, H. Zhou, Q.-Y. Zhu, L. Lekamalage, C. Kasun, H. Zhou, G.-B. Huang, C.M. Vong, G. Hinton, P. Vincent, Extreme learning machine for regression and multiclass classification, 2006. www.computer.org/intelligent.
2.Bin Huang G., Zhu Q.Y., Mao K.Z., Siew C.K., Saratchandran P., Sundararajan N. Can threshold networks be trained directly?, ieee transactions on circuits and systems II. Express Briefs. 2006;53:187–191. doi: 10.1109/TCSII.2005.857540. [DOI] [Google Scholar]
3.Huang G., Song S., Gupta J.N.D., Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 2014;44:2405–2417. doi: 10.1109/TCYB.2014.2307349. [DOI] [PubMed] [Google Scholar]
4.Yadav B., Ch S., Mathur S., Adamowski J. Estimation of in-situ bioremediation system cost using a hybrid extreme learning machine (ELM)-particle swarm optimization approach. J. Hydrol. (Amst) 2016;543:373–385. doi: 10.1016/j.jhydrol.2016.10.013. [DOI] [Google Scholar]
5.Guo X., Yang J., Shen Y., Zhang X. Prediction of agricultural carbon emissions in China based on a GA-ELM model. Front Energy Res. 2023;11 doi: 10.3389/fenrg.2023.1245820. [DOI] [Google Scholar]
6.Ibe F.C., Opara A.I., Duru C.E., Obinna I.B., Enedoh M.C. Statistical analysis of atmospheric pollutant concentrations in parts of Imo State, Southeastern Nigeria. Sci. Afr. 2020;7 doi: 10.1016/j.sciaf.2019.e00237. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Electricity generation, natural gas consumption and CO2 emission data of a power plant in Dhaka, Bangladesh (Original data) (Mendeley Data).

[bib0001] 1.G.-B. Huang, X. Ding, H. Zhou, Q.-Y. Zhu, L. Lekamalage, C. Kasun, H. Zhou, G.-B. Huang, C.M. Vong, G. Hinton, P. Vincent, Extreme learning machine for regression and multiclass classification, 2006. www.computer.org/intelligent.

[bib0002] 2.Bin Huang G., Zhu Q.Y., Mao K.Z., Siew C.K., Saratchandran P., Sundararajan N. Can threshold networks be trained directly?, ieee transactions on circuits and systems II. Express Briefs. 2006;53:187–191. doi: 10.1109/TCSII.2005.857540. [DOI] [Google Scholar]

[bib0003] 3.Huang G., Song S., Gupta J.N.D., Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Trans. Cybern. 2014;44:2405–2417. doi: 10.1109/TCYB.2014.2307349. [DOI] [PubMed] [Google Scholar]

[bib0004] 4.Yadav B., Ch S., Mathur S., Adamowski J. Estimation of in-situ bioremediation system cost using a hybrid extreme learning machine (ELM)-particle swarm optimization approach. J. Hydrol. (Amst) 2016;543:373–385. doi: 10.1016/j.jhydrol.2016.10.013. [DOI] [Google Scholar]

[bib0005] 5.Guo X., Yang J., Shen Y., Zhang X. Prediction of agricultural carbon emissions in China based on a GA-ELM model. Front Energy Res. 2023;11 doi: 10.3389/fenrg.2023.1245820. [DOI] [Google Scholar]

[bib0006] 6.Ibe F.C., Opara A.I., Duru C.E., Obinna I.B., Enedoh M.C. Statistical analysis of atmospheric pollutant concentrations in parts of Imo State, Southeastern Nigeria. Sci. Afr. 2020;7 doi: 10.1016/j.sciaf.2019.e00237. [DOI] [Google Scholar]

PERMALINK

Application of extreme learning machine (ELM) forecasting model on CO₂ emission dataset of a natural gas-fired power plant in Dhaka, Bangladesh

Mustafizur Rahman

Faijunnesa Rashid

Sujit Kumar Roy

Md Ahosan Habib

Abstract

1. Value of the Data

2. Data Description

3. Experimental Design, Materials, and Methods

Fig. 1.

3.1. Input layer

3.2. Hidden Layer

3.3. Output layer

Fig. 2.

Table 1.

Fig. 3.

Limitations

Ethics Statement

CRediT authorship contribution statement

Acknowledgments

Acknowledgments

Declaration of Competing Interest

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Application of extreme learning machine (ELM) forecasting model on CO2 emission dataset of a natural gas-fired power plant in Dhaka, Bangladesh

Mustafizur Rahman

Faijunnesa Rashid

Sujit Kumar Roy

Md Ahosan Habib

Abstract

1. Value of the Data

2. Data Description

3. Experimental Design, Materials, and Methods

Fig. 1.

3.1. Input layer

3.2. Hidden Layer

3.3. Output layer

Fig. 2.

Table 1.

Fig. 3.

Limitations

Ethics Statement

CRediT authorship contribution statement

Acknowledgments

Acknowledgments

Declaration of Competing Interest

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Application of extreme learning machine (ELM) forecasting model on CO₂ emission dataset of a natural gas-fired power plant in Dhaka, Bangladesh