Skip to main content
Heliyon logoLink to Heliyon
. 2024 Feb 15;10(4):e26397. doi: 10.1016/j.heliyon.2024.e26397

Optimizing home energy management: Robust and efficient solutions powered by attention networks

Mounica Nutakki 1, Srihari Mandava 1,
PMCID: PMC10906325  PMID: 38434054

Abstract

This paper explores the integration of attention networks in the realm of home energy management systems (HEMS) to enhance the robustness and efficiency of energy consumption optimization. With the growing demand for smart grid technologies, the need to achieve demand side response becomes paramount. The proposed solution leverages attention networks to dynamically allocate significance to various aspects of energy consumption patterns, considering the diverse load types and dynamic loading scenarios present in households. In this investigation, we focus on the AMpds2 dataset, characterized by intricate loading patterns, and assess its performance across various time series forecasting methodologies, including (RNN), (LSTM), (TCN), and transformers. Multiple methodologies undergo performance evaluation using diverse hyperparameter combinations. Evaluation metrics, specifically (RMSE) and (MAE), are employed. Advanced optimizers such as (Adam) and (Adamax) are applied, and activation functions, including sigmoid, linear, tanh, and ReLU, are implemented. A comprehensive performance analysis involves 16 hyperparameter combinations across four distinct time series models. Through meticulous scrutiny, it is determined that the utilization of transformers in forecasting energy and load patterns results in a 4% increase in accuracy, as elucidated in the results section. The implementation of this study is carried out on the Python 3.2 platform, and the matplotlib library is employed to visualize the comparison between actual and predicted data.

Keywords: Deep learning, Home energy management systems, Machine learning, Smart grid, Transformers

Nomenclature

ADAM

Adaptive movement estimation

Adamax

Adaptive movement estimation with maximum

ANN

Artificial Neural Network

ARIMA

Autoregressive Integrated Moving Average

ARMS

Autoregressive Moving Average

IA

Immune Algorithm

KNN

K-Nearest Neighbor

LSTM

Long short-term memory

MAE

Mean Absolute Error

MAPE

Mean Absolute Percentage Error

NYISO

New York Independent System Operator

RES

Renewable Energy Sources

RMSE

Root Mean Square Error

RNN

Recurrent Neural Network

SARIMA

Seasonality Autoregressive Moving Average

SVM

Support Vector Machine

TCN

Temporal Convolution Network

1. Introduction

In response to the growing global energy demands and the simultaneous commitment to sustainable goals [1], there is a shift towards embracing Industry 4.0. Industry 4.0 advocates for creating smart and interactive environments within industrial settings [2]. This entails enhancing device communication, optimizing data utilization, and incorporating intelligence into industrial processes. The faster technology adoption, the swifter the progress towards achieving sustainability objectives [3]. Various aspects of our daily lives have already integrated information and communication technology to enhance sustainability [4]-[5]. The energy sector is also transforming, transitioning from traditional grids to smart grids. Smart grids enable the integration of renewable energy sources, thereby reducing CO2 emissions [6]. Additionally, they incorporate demand-side response (DSR) capabilities to regulate energy consumption and alleviate pressure on the main grid [7]. Fig. 1 illustrates the operation of a smart grid, highlighting the pivotal role of communication between smart homes, (RES), and the central controller in achieving demand response and bidirectional power flow [8]-[9]. To attain demand side response, the pivotal factor lies in monitoring the energy consumption of households. Given that households encompass various load types, coupled with dynamic loading scenarios, monitoring and scheduling these loads becomes a challenging task [10]. Consequently, developing a robust energy management system is imperative for achieving demand-side response in smart grids. Illustrated in Fig. 2, Home Energy Management Systems (HEMS) are responsible for monitoring, logging, and scheduling loads [11]. It is fundamental to recognize that the load pattern heavily depends on operating time, and the unit consumption also fluctuates over time. Therefore, in applications like load monitoring and scheduling [12], understanding the load behavior in advance aids in optimizing energy consumption and attaining the desired demand side response. Load forecasting based on consumption timing is considered a crucial step in HEMS.

Figure 1.

Figure 1

Smart Grid Overview.

Figure 2.

Figure 2

Home Energy Management System.

There are many time series forecasting methods proposed in the literature, starting from statistical methods to artificial intelligence methods. Many statistical models are developed and improved over the period of time based on the increase in complexity of the time series data [13]. The order of methods proposed is such as persistent methods, (ARMA), (ARIMA) and (SARIMA) [14]. Load datasets of (NYISO), Energy Information Administration (EIA, USA), and Paschim Gujarat Vij Company Limited (PGVCL, India) has been used in [15], [16] for prediction of future electrical energy demand using time series models (ARIMA), ARIIMA and (SARIMA). Similarly, two years of real-time data in [17] of state Karnataka in India was considered for load forecasting using AR, ARMA, and ARIMA models and their accuracy comparison shows that ARIMA performs better compared to AR, ARMA. [18] proposed an ARIMAX (Auto Regressive Integrated Moving Average with external input) model for power demand forecasting in an office building. As the nonlinearity in the data increases, it is found that the statistical methods are unable to track the non-linearities, and the research shifted towards artificial intelligence. AI methods have the capability of adapting to the non-linearities and forecasting the data. Moreover, AI techniques are well-suited for big data scenarios, as they can efficiently process large and high-dimensional datasets. Their ability to capture temporal dependencies and adapt to changing patterns makes them valuable for forecasting tasks. Additionally, AI models can continuously learn and update themselves, ensuring forecasts remain accurate and up-to-date. AI techniques offer enhanced flexibility, scalability, and adaptability, making them a compelling choice for time series forecasting in complex and dynamic environments. The different time series forecasting methods are shown in Fig. 3.

Figure 3.

Figure 3

Classification of forecasting Techniques.

1.1. Problem statement

Based on the present-day grid scenarios, an average smart home is a combination of small-scale distributed energy sources, a variety of loads, and various smart devices. Therefore, integrating these into a household and communicating with the utility is challenging. In this scenario, time series forecasting is key as discussed before. A reliable and accurate forecasting method should be used to forecast the multiple utilities that are diverse in nature. Basic statistical methods fell short in adapting to the highly non-linear data; therefore, artificial intelligence methods are employed. AI models, especially deep learning models like recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, excel at handling complexities and capturing nonlinear relationships in time series data. They can automatically learn relevant features from raw data, eliminating the need for manual feature engineering. However, training these models can be computationally expensive, especially for large and complex datasets, and can suffer from vanishing or exploding gradients. Proper regularization techniques, such as gradient clipping, are necessary to mitigate these issues. Additionally, RNNs and LSTMs may struggle with long-term predictions, as errors can accumulate over time. This is known as the “short-term memory” problem, which can limit their effectiveness in capturing patterns with long-term dependencies. Attention networks, also popularly known as transformers, are proposed to overcome the shortcomings of the deep learning models. These excel at capturing long-term dependencies in sequential data. They use self-attention mechanisms to consider dependencies between all input tokens, allowing them to capture long-range relationships efficiently. This is in contrast to recurrent neural networks (RNNs). These models are also highly parallelizable, making them more efficient for training and inference. By using self-attention, computations can be performed over all tokens in parallel, resulting in faster training times and improved scalability, which is particularly useful when dealing with large datasets. There are several reasons why transformers are particularly suitable for time series forecasting tasks shown in Fig. 4.

Figure 4.

Figure 4

Advantages of transformers for time series forecasting.

1.2. Literature survey

The literature review explores various classes of forecasting models used in the field of forecasting. These models include autoregressive models (AR), statistical regressions (k-NN), (DT), (SVM), (ANN), ensemble models, improved techniques, and different hybrid models. The edge SVM models [18] are their ability to handle nonlinearity using kernel functions. Hong [19] proposed an (SVM) regression model enhanced with the (IA) to forecast the annual electric load of a specific region in Taiwan. The author also presented suitable parameters for developing the SVR load forecasting model in another study [20]. The results indicate that the forecasting mean absolute percentage error (MAPE) for all four regions remained below 2.45% when utilizing the SVRIA model. Xuemei et al. [21] introduced the use of (LS-SVM) for predicting the cooling load in an office building located in Guangzhou, China. The study utilized hourly climate data and building cooling load data collected over a period of five months to develop the model. A comparison was made with a back-propagation Artificial Neural Network (ANN), highlighting the superior performance of LS-SVM with a Mean Absolute Relative Error (MARE) of 1.65. In a separate study by Bozic et al. [22], the application of LS-SVM was proposed for short-term load forecasting. Hourly data spanning a week was used, allowing for both hourly and daily load forecasting. The obtained results demonstrated that the LS-SVM model achieved (MAPE) value. These SVMs are memory-intensive, and their training can be computationally expensive, especially for large-scale datasets. Additionally, SVMs may not perform optimally when faced with highly noisy or imbalanced datasets. Moreover, SVMs may struggle with capturing time-dependent patterns and long-term dependencies in time series data. These limitations have prompted the exploration of alternative models that have the potential to provide improved results. The benefit of the (KNN) model [23] is its simplicity and ease of implementation. KNN does not require training, so it can be used for forecasting real-time or streaming time series. Lei et al. [24] introduced a locally linear model based on multivariate time series analysis in their study. They utilized daily electrical load data from a district in Chongqing, along with temperature data as inputs, for the period of January to March 2003. The outcomes of their model, which also considered temperature, revealed a maximum forecasting error of 6.16% and an average error of 0.97%. Brown et al. [25] presented a novel approach for real-time building energy modeling and prediction using kernel regression with kNN. They employed hourly power data from four buildings over a 1.5-year period starting from January 1st, 2007, for their modeling. The results demonstrated that all four buildings' root mean square (RMS) error remained below 10.86. In another work, Al-Qahtani and Crone [26] proposed a multivariate k-NN regression method for forecasting electricity demand in the U.K. The results indicated that the proposed multivariate k-NN model achieved a mean absolute percentage error (MAPE) of 1.8133%. But the drawback of KNN is the computational cost, especially when dealing with large datasets, as it requires calculating the similarity between the new data point and all historical data points. Deep learning is often regarded as an advancement over traditional machine learning methods due to its ability to address some of the drawbacks of conventional approaches. The superiority ANN models [27] is their ability to capture and model complex nonlinear relationships present in time series data. Gonzalez and Zamarreno [28] utilized a feedback (ANN) model to predict short-term electricity load. The model achieved a maximum (MAPE) of 2.88 for hourly load forecasting. This feedback model was initially developed by Schenker [29]; the study also highlights the importance of quantifying three aspects of the ANN. Interestingly, the results suggested that many neurons might not be necessary to achieve good prediction results, as observed in the study on prediction and control models [30]. In another application, Panapakidis and Dagoumas [31] employed ANN in electricity price forecasting by combining cluster analysis with various ANN topologies. The findings revealed that the Mean Absolute Error for day-ahead price forecasting was below 7.18%. Energy consumption & scheduling and prediction in smart buildings is implemented using a fuzzy time series [31]. It was shown that a variety of parameters, such as the defuzzification and fuzzification principles and the types of operators applied in forecasting. Similarly, a regression-based model in [32] is applied for forecasting the load and generation of NIT Patna campus and the model accuracy is compared with real world smart meter data which shows the effectiveness of the proposed method. A compression between two projects Energy Company of Pernambuco (CELPE) and DEESP/UFPE is done for forecasting using software PREVER.ANNs with the MLP architecture, using the resilient backpropagation (RPROP) training algorithm [33], is used to predict load of 3, 7, 15, 30 and 45 days. The authors found that PREVER was more accurate than CELPE's load forecasting. Table 1 represents the literature work in load forecasting. ANN models do have certain limitations. Training ANNs can be computationally expensive, especially for large and complex datasets. It requires a significant amount of data for training and may struggle with overfitting if not properly regularized. This lack of interpretability may pose challenges in certain industries or applications where explainability is critical. Hybrid models involve combining multiple machine-learning techniques, resulting in increased robustness. By leveraging the strengths of each individual technique [34], these models enhance forecasting accuracy. Zhuang et al. [35] introduced a hybrid prediction approach that combines Autoregressive Integrated Moving Average (ARIMA) and Artificial Neural Network (ANN) models. The findings demonstrated that the combined model outperformed a simple time series model, effectively addressing the challenge of fitting non-linearity. In the study conducted by Huang et al. [36], an innovative particle swarm optimization (PSO) technique was introduced to identify the Autoregressive Moving Average with exogenous inputs (ARMAX) model for short-term load forecasting. The experimental outcomes revealed that the proposed PSO-based model consistently achieved an error rate below 2.55% for all seasons. In their study, Nie et al. [37] presented a hybrid approach that combines ARIMA and SVM methods to forecast short-term electricity load. Evaluating the performance of their proposed hybrid method, they reported a Mean Absolute Percentage Error (MAPE) of 3.85%. The MAPE for the individual ARIMA and SVM models was 4.5% and 4%, respectively.

Table 1.

Literature Survey on Forecasting Methods.

Ref Aim Model Limitations
[19] Forecasting SVR with IA, SVMG, RM, and ANN model only two metrics has used for calculating the model performance
[20] Load Forecasting SVR with IA, SVMG, RM, and ANN model Neglected accuracy
[21] Load Forecasting LS-SVM, BPNN proposed model failed to handle the uncertainties
[22] Short term power load forecasting Intelligent algorithm used for load forecasting -
[24] Short term power load forecasting PSR based on mutual information, ORLLM multivariate time series -
[25] Load Forecasting Kernel regression, NN algorithms
[26] Load Forecasting Multivariate KNN regression Limited scientific studies of K-NN in forecasting time series data, Limited to identifying past motifs of the same dependent variable
[27] Forecast electric energy consumption The Feedforward NN model and all covariates are not incorporated
[28] Short-term load prediction Feedback ANN, Hybrid algorithm for training the ANN proposed model failed to handle the uncertainties
[29] Load Forecasting Dynamic NN-Feedback linearization, Empirical models -
[29] Load Forecasting Multilayer Feedforward NN, Back propagation for learning Informative features not considered, less accurate
[30] Load Forecasting ANN, Hybrid models combining ANN with clustering algorithm Neglected accuracy
[31] Long-term load prediction Clustering NN with logic operators, Grey forecasting mode Limitations in load forecasting, method is based on smoothly sequence
[32] Load Forecasting Fuzzy time series for energy consumption forecasting -
[33] Short and Mid term load prediction Fuzzy logic, NN only two metrics has used for calculating the model performance
[34] Load Forecasting Single forecasting methods (ANN, CR, TREE, LR and SVR), Ensemble voting techniques with 2,3 or 4 techniques proposed model failed to handle the uncertainties
[35] Load Forecasting Time series models, NN models All covariates are not incorporated
[36] Short-term load prediction PSO, EP STS technique may stall at local minimum points, PSO algorithm requires shorter computation time
[37] Short-term load prediction ARIMA, SVM Neglected accuracy

2. Methodology

2.1. Dataset

A house data was collected that was constructed in 1955 in British Columbia's Greater Vancouver metropolitan area. The home had significant renovations in 2005 and 2006 and received an Energy Guide23 rating of 82% from the Canadian Government, up from 61%. The residence is situated in the city of Burnaby, Vancouver, East. From Harvard Dataverse (Data Citation 2), AMPds2 is publicly accessible for download in a variety of formats [38], including the original CSV files, RData, and tab-delimited formats. The house file's electricity consumption description is included in AMPds2. Electricity, Water, Natural Gas, and Climate are the four different categories of AMPds2 data, and in this paper, the electricity dataset was considered for the prediction analysis. For instance, the Electricity billing.csv file would have data on electricity billing, and the Electricity CDE.csv file would contain data on electricity from the clothes dryer (CDE) meter.

The 240 V, 200A service is provided to the home by the provincial utility BC Hydro. Two DENT PowerScout 18 units metering 21 loads of data for a time period of 2 years (2012-2014). 21 loads of data for each and every minute were recorded. Fig. 5 gives the load details of the 21 loads and Fig. 6 shows the overall Load consumption of residence. A gas cooktop plug breaker, a microwave plug breaker, and a randomly selected lighting breaker were the three loads that were removed because no activity was noted. The measurements which are low in values were recorded as zero.

Figure 5.

Figure 5

Bus Diagram.

Figure 6.

Figure 6

Overall Load Consumption.

2.2. Temporal Neural Network (TCN)

Deep learning architectures called Temporal Convolutional Networks (TCNs) are especially good at modeling and analyzing time series data. TCNs use one-dimensional convolutional neural networks' (CNNs') abilities to recognize patterns and temporal dependencies in sequential input [39]. Applications like time series forecasting, detecting anomalies, and sequence modeling are particularly well suited for TCNs. TCNs do not rely on recurrent connections to account for previous inputs like standard recurrent neural networks (RNNs) or long short-term memory (LSTM) networks do. They use dilated convolutions instead, which give them an extended receptive field and the ability to capture long-term dependencies.

Due to this dilatation property, TCNs are easier to train than RNNs and more efficient in terms of redundancy. Additionally, it assists in solving the vanishing gradient issue and enables the network to successfully learn both short- and long-term patterns in historical data. Dilated convolutions are built up in layers within TCNs, followed up by non-linear activation functions. These layers take in both local and global trends in the time series by acquiring hierarchy representations of the incoming data. TCNs may additionally include skip connections, which help the network maintain vital information from earlier layers and enhance overall speed.

An overview of the TCNs' architecture is presented below in Fig. 7 and Fig. 8:

Figure 7.

Figure 7

Block Diagram of TCN.

Figure 8.

Figure 8

TCN block architecture.

Convolutional layers, identical to those utilized for image processing, are at the core of TCN. Using the input sequence as a starting point, these layers train themselves to recognize patterns and characteristics at various temporal scales. TCN processes sequences along the time dimension using 1D convolutions, in contrast to typical convolutional layers for images. Using dilated causal convolutions is one of the features of TCN. To maintain the temporal order of the sequence, causal convolutions ensure that information only flows from the past to the future. Incorporating a dilation factor into the convolutional network, dilated convolutions enable the network to detect correlations across longer distances without drastically raising computation costs. Multiple stacked convolutional layers may be used in TCNs, allowing the network to learn hierarchical representations of the input sequence. Each additional layer has the ability to capture more abstract and advanced information.

TCNs frequently include residual connections to aid training and reduce vanishing gradient issues. These connections make Deep TCN model training feasible, making it easier for gradients to move through the network. Common activation functions, such as ReLU (Rectified Linear Unit), are used to add non-linearity to the model after convolutional processes. To minimize the temporal dimension, TCNs may include pooling layers like max pooling. This can lower the necessary processing and improve the network's capacity to detect significant characteristics.

TCNs frequently have an output layer that generates forecasts or illustrations of the input sequence, it also have the advantage of processing sequences in parallel, which increases their computational efficiency and makes them a good choice for applications in which speed is essential. To maximize TCN performance on certain tasks, it is essential to choose hyperparameters, such as filter sizes, dilation rates, and the number of layers.

TCNs have gained popularity in the machine learning field and have been shown to perform well in a range of sequence modeling applications, particularly time series analysis. They are a useful tool for representing sequential data since they can parallelize both short- and long-term dependencies.

2.3. Transformers

Transformer neural networks, a subset of deep learning models, have become quite popular in the natural language processing (NLP) community. In [40] published a landmark study titled “Attention Is All You Need” that featured the introduction of this architecture. In contrast to conventional recurrent neural networks (RNNs), transformers only use attention techniques to identify dependencies among various words or tokens in a sequence. They use self-attention, which enables the model to consider the significance of each word in the input sequence as it processes it, improving the modeling of long-range dependencies.

Transformers have been successful in natural language processing and have shown promise in the field of time series analysis.

The capacity of transformers to model distance dependencies is one of their primary advantages in time series analysis. Transformers can gather data from distant time points and understand complicated temporal patterns because they are built to recognize relationships between every element in a sequence. This is especially useful for predicting the stock market, predicting energy use, or forecasting the weather, all of which include long-term dependencies.

Transformers also perform exceptionally well in parallel computing, enabling effective training and inference on potent hardware like GPUs. These are appealing for real-time and large-scale time series analysis because to their parallelization capacity, which considerably accelerates the modeling process.

Transformers can be used in time series analysis for various purposes, such as forecasting future values, anomaly detection, and pattern recognition. Transformers can consider the whole history of a time series to generate precise predictions and find latent patterns in the data by utilizing self-attention processes and positional encodings.

Although transformer-based models are still an emerging area of study in time series analysis, they have demonstrated promising results and have the potential to offer insightful information and boost forecasting precision in a variety of fields that deal with sequential data over time.

The fundamental structure of transformers comprises two main components: the encoder and the decoder. Fig. 9 illustrates the basic structure of the encoder and decoder components. The input data is fed into the encoder, and the decoder generates the output. Determining the appropriate number of encoders and decoders is a hyperparameter that varies depending on the specific application.

Figure 9.

Figure 9

Basic structure of Encoder-Decoder.

Before inputting the data into the encoder, an embedding algorithm is applied to convert the data into vectors of the same dimension. This ensures that all inputs have consistent dimensions. The resulting vectors are then passed to the self-attention layer. Unlike traditional recurrent neural networks (RNNs), all inputs are processed simultaneously rather than sequentially. The self-attention layer ensures that each input is correctly associated, avoiding confusion. This layer facilitates the connection between each value in the data and its corresponding date and time. The detailed step-by-step procedure is outlined below.

  • Step 1: Once the vector dimension is established, three weight matrices, denoted as Wq, Wk, and Wv, are created. These weights are initialized randomly using back propagation. The inputs are then multiplied by each respective weight matrix, resulting in three separate vectors with a dimension of 64. These vectors are referred to as queries, keys, and values which are shown in below equations (1)-(6) and Fig. 10.
    X1,2,....nWq=Q1,2....,n (1)
    .X1,2,....nWk=K1,2....,n (2)
    .X1,2,....nWv=V1,2....,n. (3)
  • Step 2: Calculating the scores. The scores S1,2....,n values are calculated by multiplying each and every queries Q1,2....,n with the keys K1,2....,n as shown in below equations.
    Q1K1,2....,n=S1 (4)
    .Q2K1,2....,n=S2 (5)
    ...QnK1,2....,n=Sn. (6)
  • Step 3 The calculated scores in step 2 are divided by 8 which is called as root of dk.

  • Step 4 In this step softmax layer is applied as there are multiple categories and the value of softmax is always near to 1. After applying the softmax, the value v1,2....,n of each vector is near to the related input.

  • Step 5 After the calculation of softmax values, these values v1,2....,n are multiplied with the values from equation(10), and the result gives a vector dimension a1,2....,n. All the resultant vector dimensions are summed up, which gives the output Z1. This Z1 is the output of the self-attention layer is given to the input to the feed-forward neural network layer.

Figure 10.

Figure 10

Queries, Keys, Values.

The above process is a single head attention as Wq, Wk, and Wv are the weights only used. In multi-head attention, multiple weights will be used as shown in Fig. 10 above. The main advantage of multi-head attention is in order to know the importance of each and every input. Generally, in multi-head 8 heads will be used, which gives z0,2....,7. All these z values are combined and multiplied with another weight W0 that creates the final value Z as shown in Fig. 11.

Figure 11.

Figure 11

Multi-Head Attention.

There are additionally two other layers called normalization and positional encoding. These layers are activated when the self-attention and feed-forward neural network layers don't function properly. This means if the feed-forward neural network doesn't function properly, then direct input is given to the normalization layer. Similarly, if the attention layer doesn't function properly, then direct input is given to the positional encoding layer.

The decoder will run repeatedly until it reaches the end of state (EOS). This means the encoder output is given input to a decoder, which gives the output O1, then again O1 is given input to the decoder which gives the output O2. This process continues until it reaches the end of the state as shown in below Fig. 12.

Figure 12.

Figure 12

Transformers block architecture.

2.4. AdaMax

AdaMax is an extension of the Adam algorithm for gradient descent optimization. It generalizes the approach to the infinite norm (max) and can result in more effective optimization for certain problems. Adam updates weights based on past gradients' scaled L2 norm (squared), while AdaMax extends this to the infinite norm (max) of past gradients. It adopts a separate learning rate for each parameter in the optimization problem. This algorithm maintains moment vectors and exponentially weighted infinity norms for each parameter referred to as n and v, which are initialized to 0.0.

The algorithm is executed iteratively, starting from time t=1. Each iteration entails computing new parameter values denoted as y, transitioning from y(t1)to y(t). Initially, the gradient (partial derivatives) is calculated for the current time step as shown in below equation (7).

H(t)=f(y(t1)) (7)

and the moment vector and infinity norm are updated using hyperparameters δ1 and δ2 shown in (8) and (9).

n(t)=δ1m(t1)+(1δ1)H(t) (8)
v(t)=max(δ2u(t1),abs(H(t))) (9)

The max() function selects the greater value among the parameters, while the abs() function calculates the absolute value.

The parameter value can be updated by breaking it down into three components. The first component calculates the step size parameter, the second component computes the gradient, and the third component utilizes the step size and gradient to determine the new parameter value.

Let's begin by calculating the step size (η) for the parameter, which involves an initial step size hyperparameter known as γ. There is also a decaying version of δ1 over time, with a specific value for this time step referred to as δ1(t):

η(t)=γ1δ1(t) (10)

The gradient used to update the parameter is computed in the following manner (11):

Φ(t)=n(t)v(t) (11)

Lastly, we can determine the value of the parameter for the current iteration by evaluating equation (12):

y(t)=y(t1)η(t)Φ(t) (12)

Alternatively, the comprehensive update equation (13) is expressed as:

y(t)=y(t1)γ1δ1(t)n(t)v(t) (13)

To summarize, there are three hyperparameters associated with the algorithm:

γ: The initial step size or learning rate, typically set to 0.002.

δ1: The decay factor for the first momentum, often set to 0.9.

δ2: The decay factor for the infinity norm, typically set to 0.999.

As mentioned in the paper, the suggested decay schedule for δ1 involves raising the initial beta1 value to the power of t. However, alternative decay schedules, such as maintaining a constant value or decaying more aggressively, can also be used.

3. Result analysis

AMPds2 dataset considered for the analysis includes 21 diversified loads with uneven consumption patterns. The overall consumption pattern of the loads can be observed in Fig. 5. Forecasting of energy consumption is performed using multiple artificial intelligence methods. RNN architectures such as simple RNN and LSTM, advanced convolution techniques such as TCN, and attention networks such as Transformers are used. Each model is trained with various combinations of hyperparameters; activation functions such as linear, tanh, ReLU, and sigmoid are used; Adam and Adamax optimizers are used. Model performance for all the combinations is evaluated using RMSE and MAE metrics.

3.1. Deep learning techniques

The Deep Learning techniques proposed in this study involve reconfiguring and resizing the dataset to match the requirements of the NN model. The first RNN model used in this study consists of one input layer, two output layers, and two hidden layers. The first and second hidden layers are composed of 40 nodes, respectively. The RNN model includes 5362 trainable parameters and 0 non-trainable parameters. Additionally, the training data is split into 30% for validation purposes. Subsequently, the trained model is tested using separate test data to predict future values. Fig. 13(a,b) displays a comparison of the real and predicted values of both load and solar. Each hour's data is considered as one sample, resulting in a total of 24 samples per day. Hyperparameters such as epochs and batch size are fine-tuned to train the model.

Figure 13.

Figure 13

RNN- (a) Actual (vs) Prediction Load Plot, (b) Actual (vs) Prediction Solar Plot.

The LSTM model, in particular, is defined with one input and output layer and three hidden layers. The first, second, and third layers consist of 120, 80, and 40 nodes. The defined LSTM model has a total of 142,363 trainable parameters and no non-trainable parameters. Once the model is trained, the test data is given as input, and future values are predicted. Fig. 14(a,b) showcases a comparison between the real and predicted values of both load and solar of the LSTM model.

Figure 14.

Figure 14

LSTM- (a) Actual (vs) Prediction Load Plot, (b) Actual (vs) Prediction Solar Plot.

Like the RNN and LSTM models, the TCN model is also defined with one input and output layer and one hidden layer. According to the TCN model, the first layer consists of 64 nodes. The defined TCN model has a total of 136642 trainable parameters and no non-trainable parameters.

The model is trained using epochs and batch size as hyperparameters. During the training process, each epoch shows a significant decrease in loss of approximately 0.5%, as depicted. Fig. 15(a,b) compares the real and predicted values of both load and solar for the TCN model. In this comparison, it becomes apparent that the TCN model is well-trained compared to the RNN and LSTM models.

Figure 15.

Figure 15

TCN-(a) Actual (vs) Prediction Load Plot, (b) Actual (vs) Prediction Solar Plot.

3.2. Transformers

In this transformer model, after normalization, the encoder input is connected to the multi-head attention layer, and the output of this multi-head is given to a feed-forward network with one input, one output, and three hidden layers. Now, the input is given to the decoder's multi-head from the feed-forward network's output. Again, similar to the process in the encoder, the output of the multi-head decoder is given as the input to the decoder feed-forward neural network, and the output of that network is sent to the softmax layer, which gives the resultant output, in this model, 8 multi-heads have been used with trainable parameters of 550985. Fig. 16 (a,b) compares the real and predicted values of both load and solar for the transformers model.

Figure 16.

Figure 16

Transformers- (a) Actual (vs) Prediction Load Plot, (b) Actual (vs) Prediction Solar Plot.

In this case, in order to know the better performance of the RNN, LSTM, and TCN models, Table 2, Table 3, Table 4 shows these models are trained using different activation functions (sigmoid, linear, tanh, ReLU), optimizers (Adam, Adamax), and loss functions (Mean square error, mean absolute error) and the number of epochs is 50 with a batch size of 32. The analysis shows that the TCN model adamax optimizer performs best compared to RNN and LSTM. The performance metrics of the proposed models are tabulated in Table 5 above, which shows that transformers outperform the other models like RNN, LSTM, and TCN in terms of RMSE, MAE, and accuracy. Fig. 17 shows the accuracy comparison of proposed models.

Table 2.

Result Analysis of RNN.

3.2.

Table 3.

Result Analysis of LSTM.

3.2.

Table 4.

Result Analysis of TCN.

3.2.

Table 5.

Overall Comparison of Proposed Models.

Model RMSE MAE Accuracy
RNN 0.081 0.053 93.14
LSTM 0.073 0.0450 94.08
TCN 0.0521 0.033 95.10
Transformers 0.0145 0.0246 99.05

Figure 17.

Figure 17

Overall accuracy comparison of different models.

4. Conclusion

This article acknowledges the significance of prediction in the context of Home Energy Management Systems (HEMS) within the framework of a smart grid. The article leverages attention networks' capability to forecast intricate electricity consumption patterns. This research delves into the utilization of various deep learning methods, including Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Temporal Convolutional Networks (TCN), and transformers, for predicting both electricity load and solar generation. Intriguing conclusions emerge after a thorough performance evaluation across various combinations of hyperparameters. Notably, it is observed that the combination of the AdaMax optimizer with the ReLU activation function produces superior results, demonstrating very low values for both RMSE and MAE; for the TCN model, very low values of RMSE (0.0521) and MAE (0.033) are registered. While this observation may not universally apply to every dataset, it holds for most. Additionally, the choice of a time series forecasting model significantly influences the reduction of errors. Opting for a highly adaptive model proves most effective, particularly when dealing with nonlinear and complex datasets. As mentioned in the literature and proved by our study, the attention networks with transformers methodology yield the highest accuracy, showcasing an approximately 4% improvement. Achieving an accuracy of 99.05% with the lowest RMSE of 0.0145 and MAE of 0.0246 for a complex dataset like AMpds2 underscores the remarkable adaptability of the model.

4.1. Policy recommendation and limitations

Although transformers have proven to be effective in their forecasting ability, certain drawbacks and precautions must be taken. Transformers are primarily designed for natural language processing tasks; when they are implemented in time series forecasting, there are certain limitations. For high dimensional data or long-term data, the complexity of the attention network increases, resulting in higher training times. Transformers require a large amount of data compared to other models for training; if not, it may lead to overfitting. Data preprocessing steps such as padding and truncation should be performed to remove the irregular data samples; this step may increase design complexity. Therefore, the application of transformers is more suitable for large and complex datasets; implementation attention mechanism for small and simple applications leads to unnecessary design complexity.

CRediT authorship contribution statement

Mounica Nutakki: Writing – original draft, Software, Methodology, Investigation, Formal analysis, Conceptualization. Srihari Mandava: Visualization, Validation, Supervision, Conceptualization.

Declaration of Competing Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Srihari Mandava reports administrative support, article publishing charges, travel, and writing assistance were provided by Vellore Institute of Technology. Srihari Mandava reports a relationship with Vellore Institute of Technology that includes: employment.

Data availability

The experiment conducted using datasets that are publicly available.

References

  • 1.Sunny M.R., Kabir M.A., Naheen I.T., Ahad M.T. 2020 IEEE Green Technologies Conference (GreenTech) IEEE; 2020. Residential energy management: a machine learning perspective; pp. 229–234. [Google Scholar]
  • 2.Raza M.Q., Khosravi A. A review on artificial intelligence based load demand forecasting techniques for smart grid and buildings. Renew. Sustain. Energy Rev. 2015;50:1352–1372. [Google Scholar]
  • 3.Balsalobre-Lorente D., Abbas J., He C., Pilař L., Shah S.A.R. Tourism, urbanization and natural resources rents matter for environmental sustainability: the leading role of ai and ict on sustainable development goals in the digital era. Resour. Policy. 2023;82 [Google Scholar]
  • 4.Abbas J., Rehman S., Aldereai O., Al-Sulaiti K.I., Shah S.A.R. Human Systems Management (Preprint) 2023. Tourism management in financial crisis and industry 4.0 effects: managers traits for technology adoption in reshaping, and reinventing human management systems; pp. 1–18. [Google Scholar]
  • 5.Al-Sulaiti K., Abbas J., Al-Sulaiti I. 2023. Tourists' Online Information Influences Their Dine-Out Behaviour: Country-of-Origin Effects as a Moderator. [Google Scholar]
  • 6.Shah S.A.R., Zhang Q., Abbas J., Balsalobre-Lorente D., Pilař L. Technology, urbanization and natural gas supply matter for carbon neutrality: a new evidence of environmental sustainability under the prism of cop26. Resour. Policy. 2023;82 [Google Scholar]
  • 7.Shah S.A.R., Zhang Q., Abbas J., Tang H., Al-Sulaiti K.I. Waste management, quality of life and natural resources utilization matter for renewable electricity generation: the main and moderate role of environmental policy. Util. Policy. 2023;82 [Google Scholar]
  • 8.Kuo P.-H., Huang C.-J. A high precision artificial neural networks model for short-term energy load forecasting. Energies. 2018;11(1):213. [Google Scholar]
  • 9.Ahmad T., Chen H., Shah W.A. Effective bulk energy consumption control and management for power utilities using artificial intelligence techniques under conventional and renewable energy resources. Int. J. Electr. Power Energy Syst. 2019;109:242–258. [Google Scholar]
  • 10.Kabir A., Sunny M.R., Siddique N.I. 2021 IEEE International Conference in Power Engineering Application (ICPEA) IEEE; 2021. Assessment of grid-connected residential pv-battery systems in Sweden-a techno-economic perspective; pp. 73–78. [Google Scholar]
  • 11.Malekizadeh M., Karami H., Karimi M., Moshari A., Sanjari M. Short-term load forecast using ensemble neuro-fuzzy model. Energy. 2020;196 [Google Scholar]
  • 12.Almazrouee A.I., Almeshal A.M., Almutairi A.S., Alenezi M.R., Alhajeri S.N. Long-term forecasting of electrical loads in Kuwait using prophet and Holt–Winters models. Appl. Sci. 2020;10(16):5627. [Google Scholar]
  • 13.Deb C., Zhang F., Yang J., Lee S.E., Shah K.W. A review on time series forecasting techniques for building energy consumption. Renew. Sustain. Energy Rev. 2017;74:902–924. [Google Scholar]
  • 14.Gupta A., Kumar A. 2020 IEEE International Conference on Environment and Electrical Engineering and 2020 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe) IEEE; 2020. Mid term daily load forecasting using arima, wavelet-arima and machine learning; pp. 1–5. [Google Scholar]
  • 15.Patel N., Patel M., Patel R. Electrical energy demand forecasting using time series approach. Int. J. Technol. Glob. 2020;29(3s):594–604. [Google Scholar]
  • 16.Dodamani S., Shetty V., Magadum R. 2015 International Conference on Technological Advancements in Power and Energy (TAP Energy) 2015. Short term load forecast based on time series analysis: a case study; pp. 299–303. [DOI] [Google Scholar]
  • 17.Newsham G.R., Birt B.J. Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building. 2010. Building-level occupancy data to improve arima-based electricity use forecasts; pp. 13–18. [Google Scholar]
  • 18.Vapnik V. Springer Science & Business Media; 1999. The Nature of Statistical Learning Theory. [Google Scholar]
  • 19.Hong W.-C. Electric load forecasting by support vector model. Appl. Math. Model. 2009;33(5):2444–2454. [Google Scholar]
  • 20.Ahmad W., Ayub N., Ali T., Ifran M., Awais M., Shiraz M., Glowacz A. Towards short term electricity load forecasting using improved Support Vector Machine and Extreme Learning Machine. Energies. 2020;13(11) [Google Scholar]
  • 21.Xuemei L., Jin-hu L., Lixing D., Gang X., Jibin L. vol. 1. IEEE; 2009. Building Cooling Load Forecasting Model Based on LS-SVM; pp. 55–58. (2009 Asia-Pacific Conference on Information Processing). [Google Scholar]
  • 22.Hong T. North Carolina State University; 2010. Short Term Electric Load Forecasting. [Google Scholar]
  • 23.Zhi Z., Manlong2+ Z., Chen Z., Liangchang H. Research on nearest neighbors classification techniques. J. Front. Comput. Sci. Technol. 2011;5(5):467. [Google Scholar]
  • 24.Lei S.-l., Sun C.-x., Zhou Q., Zhang X.-x. 2005 IEEE Russia Power Tech. IEEE; 2005. The research of local linear model of short term electrical load on multivariate time series; pp. 1–5. [Google Scholar]
  • 25.Brown M., Barrington-Leigh C., Brown Z. Kernel regression for real-time building energy analysis. J. Build. Perform. Simul. 2012;5(4):263–276. [Google Scholar]
  • 26.Al-Qahtani F.H., Crone S.F. The 2013 International Joint Conference on Neural Networks (IJCNN) IEEE; 2013. Multivariate k-nearest neighbour regression for time series data—a novel algorithm for forecasting uk electricity demand; pp. 1–8. [Google Scholar]
  • 27.Nizami S.J., Al-Garni A.Z. Forecasting electric energy consumption using neural networks. Energy Policy. 1995;23(12):1097–1104. [Google Scholar]
  • 28.Gonzalez P.A., Zamarreno J.M. Prediction of hourly energy consumption in buildings based on a feedback artificial neural network. Energy Build. 2005;37(6):595–601. [Google Scholar]
  • 29.Schenker B.G.E. ETH; Zurich: 1996. Prediction and control using feedback neural networks and partial models. Ph.D. thesis. [Google Scholar]
  • 30.Panapakidis I.P., Dagoumas A.S. Day-ahead electricity price forecasting via the application of artificial neural network based models. Appl. Energy. 2016;172:132–151. [Google Scholar]
  • 31.Zhao W., Niu D. 2010 2nd IEEE International Conference on Information Management and Engineering. IEEE; 2010. A mid-long term load forecasting model based on improved grey theory; pp. 633–635. [Google Scholar]
  • 32.Popov V., Fedosenko M., Tkachenko V., Yatsenko D. 2019 IEEE 6th International Conference on Energy Smart Systems (ESS) IEEE; 2019. Forecasting consumption of electrical energy using time series comprised of uncertain data; pp. 201–204. [Google Scholar]
  • 33.Vats V.K., Rai S., Bharti D., De M. 2018 4th International Conference on Computing Communication and Automation (ICCCA) IEEE; 2018. Very short-term9 short-term and mid-term load forecasting for residential academic institute: a case study; pp. 1–6. [Google Scholar]
  • 34.Chou J.-S., Tran D.-S. Forecasting energy consumption time series using machine learning techniques based on usage patterns of residential householders. Energy. 2018;165:709–726. [Google Scholar]
  • 35.Zhuang J., Chen Y., Shi X., Wei D. Building cooling load prediction based on time series method and neural networks. Int. J. Grid Distrib. Comput. 2015;8(4):105–114. [Google Scholar]
  • 36.Huang C.-M., Huang C.-J., Wang M.-L. A particle swarm optimization to identifying the armax model for short-term load forecasting. IEEE Trans. Power Syst. 2005;20(2):1126–1133. [Google Scholar]
  • 37.Nie H., Liu G., Liu X., Wang Y. Hybrid of arima and svms for short-term load forecasting. Energy Proc. 2012;16:1455–1460. [Google Scholar]
  • 38.Makonin S., Ellert B., Bajić I.V., Popowich F. Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014. Sci. Data. 2016;3(1):1–12. doi: 10.1038/sdata.2016.37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lim B., Zohren S. Time-series forecasting with deep learning: a survey. Philos. Trans. R. Soc. A. 2021;379(2194) doi: 10.1098/rsta.2020.0209. [DOI] [PubMed] [Google Scholar]
  • 40.Waswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A., Kaiser L., Polosukhin I. NIPS. 2017. Attention is all you need. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The experiment conducted using datasets that are publicly available.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES