Heuristic based federated learning with adaptive hyperparameter tuning for households energy prediction

Liana Toderean; Mihai Daian; Tudor Cioara; Ionut Anghel; Vasilis Michalakopoulos; Efstathios Sarantinopoulos; Elissaios Sarmas

doi:10.1038/s41598-025-96443-3

. 2025 Apr 12;15:12564. doi: 10.1038/s41598-025-96443-3

Heuristic based federated learning with adaptive hyperparameter tuning for households energy prediction

Liana Toderean ¹, Mihai Daian ¹, Tudor Cioara ^1,^✉, Ionut Anghel ^1,^✉, Vasilis Michalakopoulos ², Efstathios Sarantinopoulos ², Elissaios Sarmas ²

PMCID: PMC11993608 PMID: 40221586

Abstract

Federated Learning is transforming electrical load forecasting by enabling Artificial Intelligence (AI) models to be trained directly on household edge devices. However, the prediction accuracy of federated learning models tends to diminish when dealing with non-IID data highlighting the need for adaptive hyperparameter optimization strategies to improve performance. In this paper, we propose a novel hierarchical federated learning solution for efficient model aggregation and hyperparameter tuning, specifically tailored to household energy prediction. The households with similar energy profiles are clustered at the edge, linked, and aggregated at the fog level, to enable effective and adaptive hyperparameter tuning. The federated model aggregation is optimized using hierarchical simulated annealing optimization to prioritize updates from the better-performing models. A genetic algorithm-based hyperparameter optimization method reduces the computational load on edge nodes by efficiently exploring different configurations and using only the most promising ones for edge nodes’ cross-validation. The evaluation results demonstrate a significant improvement in average prediction accuracy and better capturing of energy patterns compared to the federated averaging approach. The impact on network traffic among nodes across different layers is kept below 30 KB. Additionally, hyperparameter tuning reduces the size of model updates and the number of communication rounds by 30%, which is particularly beneficial when network resources are limited.

Keywords: Federated learning, Energy prediction, Hyperparameters optimization, Simulated Annealing-based federated aggregation, Genetic algorithm

Subject terms: Energy grids and networks, Computer science, Information technology

Introduction

Energy prediction is significant for modern power grids, ensuring their efficient operation, mitigating instability, and optimizing resource allocation and renewable energy source integration¹. In recent years, progress has been made in ML forecasting for energy prediction^2,3. The accuracy and reliability of energy forecasts have been improved by leveraging sophisticated models and large datasets to anticipate demand and supply fluctuations more precisely. However, large amounts of data are utilized in the training process to create effective prediction models. Since the household’s energy data contains sensitive information about individuals’ behaviours, ensuring privacy in learning while still achieving good performance is an open research topic⁴. Even with strong privacy and security guarantees, the households’ residents are often reluctant to grant access to their energy data for storage in centralized cloud silos, where it can be further processed and used for model training purposes⁵.

Recently, Federated Learning (FL) has emerged as a promising approach in the field of energy prediction, particularly for electrical load forecasting. It enables local prediction model training on data collected and stored on household devices at the edge and offers advantages for training models on distributed data, including improved efficiency and enhanced data privacy. Taïk et al.⁶ conducted one of the first studies on electrical load forecasting using edge computing and FL. They employed Long short-term memory (LSTM) in a federated scenario to predict residential load for 200 houses in Texas. Their approach highlighted the benefits of personalization through re-training, achieving a 5% performance increase in terms of root mean square deviation (RMSE) and mean absolute percentage error (MAPE). Similarly, Liu et al.⁷ introduced a FL framework for smart grids, integrating power consumption data with weather features from 60 transformer stations in Zhuhai, China. This study utilized LSTMs and boosting trees, comparing horizontal and vertical FL models using MSE as the performance metric. The work emphasized the importance of securing power traces in collaborative learning environments.

Further research indicated the diminishing performance of FL when dealing with non-independent and identically distributed (non-IID) data^8,9. This prompted several researchers to experiment with clustering techniques. Savi et al.¹⁰ explored short-term load forecasting (STLF) at the edge, using FL and clustering methodologies. The prediction model was based on LSTMs and incorporated weather data. They compared FL with clustering learning in terms of accuracy, impact of clustering, scalability, and communication cost, with the Kmeans FL model achieving the best performance in most metrics. Brigs et al.¹¹ conducted a similar study with an LSTM-based model enhanced with weather data. They tested several scenarios, comparing FL, centralized learning, local learning, and Hierarchical Clustering (HC). The results showed that FL approaches outperformed centralized learning but underperformed local learning. However, with a personalization step, FL and its clustered variant (FL + HC) improved performance by up to 5% over localized learning while maintaining data privacy. Additionally, FL + HC with fine-tuning significantly reduced computational demands, requiring up to 10 times fewer samples for optimal model performance. He et al.¹² tested residential STLF on 250 households from Australia, using LSTM models and K-means clustering in a federated setting. It showcased the importance of clustering and indicated that FL can be particularly useful for collaborative training in cases of users with missing historical data. More advanced clustering techniques have been used in^13,14. Tun et al.¹³ implemented bi-directional LSTM models with ordering points to identify the clustering structure for STLF on data from 22 households in British Columbia. Their comparison between clustered and non-clustered approaches revealed the benefits of clustering in improving forecast accuracy. Gholizadeh et al.¹⁴ introduced hyperparameter-based clustering for electrical load forecasting on 75 households in Edmonton, comparing FL with centralized and local learning using RMSE. The results revealed that the clustering method significantly reduced the convergence time and that FL performed worse than local learning and better than centralized learning in individual load forecasting. Fernández et al.¹⁵ focused on privacy-preserving FL for residential STLF, testing various architectures and scenarios. Their findings suggest that FL performs worse than centralized learning in terms of accuracy, the performance of FL increases proportionally with the number of participating clients. Additionally, clustering methods enhance forecasting accuracy, while complex model architectures involve high computational costs and pose risks of overfitting. Duttagupta et al.¹⁶ explored lightweight FL for distributed load forecasting using a feedforward neural network model, demonstrating that lightweight models could indeed achieve comparable performance to more complex architectures. The experiments highlighted the potential of FL in reducing computational costs while maintaining accuracy.

A limited number of studies have experimented with variations of the federated aggregations algorithms in energy predictions. Wang et al.¹⁷ introduced the SecFedAProx-LSTM an adaptive FL framework for multiparty wind power forecasting, based on an LSTM model, a variation of the FedProx framework, and secure aggregation. Their method demonstrated three key advantages. It provided more accurate and reliable forecasts compared to Multilayer Perceptron, Convolutional Neural Network, Recurrent Neural Network, and Gated recurrent unit (GRU) models and achieved faster convergence and improved accuracy in the presence of statistical heterogeneity compared to FedProx, especially as the number of clients increased. Additionally, it ensured privacy without requiring a third party for key generation, using Decentralized Multi-Client Functional Encryption for secure aggregation. Fekri et al.¹⁸ experimented with two federated aggregation algorithms: FedSGD and FedAVG. Both achieved higher accuracy than individual and central models for one-hour forecasting, with FedAVG slightly better. For 24-hour forecasting, FedAVG outperformed all methods, while FedSGD had convergence issues. The approach maintained high accuracy even when new smart meters joined post-training. Some approaches aim to ensure a more efficient federated model aggregation. Hu Y. et al.¹⁹ propose an aggregation method that considers the characteristics of individual datasets of the training nodes, enabling participants to make element-wise contributions to improve the learning performance and convergence speed. Hu Z. et al. propose in²⁰ a multi-objective optimization approach for FL that converges to Pareto stationary solutions. The aggregation algorithm considers individual objectives and the overall collaborative objective. Chifu et al.²¹ introduced FedWOA, a FL model for predicting renewable energy production using time series data from local prosumer nodes. Utilizing the Whale Optimization Algorithm (WOA) to aggregate LSTM model weights, FedWOA addresses data heterogeneity and variations in generation patterns. With Kmeans clustering for non-IID data management, FedWOA improved prediction accuracy by 25% for MSE and 16% for Mean absolute error (MAE) compared to FedAVG, demonstrating good convergence and reduced loss. This approach enables precise forecasts for small-scale energy prosumers through decentralized data and collaborative global model optimization.

Finally, the hyperparameters of local models may significantly impact the performance of FL for energy prediction. Improving hyperparameter selection such as learning rate, batch size, or number of epochs and dynamically adjusting them can increase convergence speed and enhance the learning of local models²². However, communication overhead and convergence speed between the edge devices and the cloud server may affect the prediction accuracy and training efficiency²³. Heuristic-based approaches are often used to find the optimal hyperparameter settings as they are exploring efficiently large search spaces by balancing the exploration and exploration in finding the optimal configuration²². Kundroo et al.²⁴ highlight the importance of selecting the appropriate configuration of hyperparameters for both model performance and training efficiency. In their case, the clients are responsible for hyperparameter optimization, by dynamically adjusting the learning rate and number of epochs according to the model training loss. Qolomany et al. propose a Particle Swarm Optimization algorithm for hyperparameter tuning of deep long short-term memory models²⁵. The number of communication rounds needed to find the best solution is reduced compared to a grid search method. Al-Wesabi et al.²⁶ use the Pelican Optimization Algorithm to fine-tune the hyperparameters of a belief network for attack detection on local IoT devices. A heuristic approach for hyperparameter tuning was applied for spiking neural networks in²⁷. This type of neural network has many hyperparameters, and the Cuckoo Search Algorithm, Grasshopper Optimization Algorithm, and Polar Bears Algorithm were tested for their optimization. Orchard meta-heuristic optimization algorithm is proposed by Bukhari et al. in²⁸ for hyperparameter tuning of a FL model that predicts photovoltaic power generation. The optimization problem solutions are composed of architectural information for the proposed Conv-SGRU model, learning, and dropout rate. Michalakopoulos et al.²⁹ propose a federated framework for collaborative model training across decentralized prosumer energy data without compromising sensitive information. They leverage clustering algorithms that utilize the models’ hyperparameters as the input space and integrate the differential privacy aggregator. The privacy-preserving transfer learning for short-term building energy consumption predictions is addressed in³⁰. The federated model learns transferable knowledge, and the hyperparameter fine-tuning process is made during the training phase using a grid search algorithm to find the optimal configuration regarding model architecture, learning rate, and the used optimizer. The grid search algorithm is also used for hyperparameter selection in³¹ in different FL settings for residential energy consumption prediction.

The paper explores a novel hierarchical FL solution for households’ energy consumption prediction that incorporates clustering techniques, simulated annealing (SA), and genetic algorithms (GAs) for efficient models’ aggregation and hyperparameters tuning. We address the challenge of effective and adaptive hyperparameter tuning for heterogeneous energy profiles by using a clustering technique. Similar energy profiles are grouped and linked for aggregation at the fog level. The GA efficiently explores the hyperparameter configurations, selecting and sending only the most promising ones to the validation nodes for evaluation. Additionally, there is a need for effective hyperparameter tuning methods that can scale to numerous households and massive datasets. These methods should be capable of handling the diverse FL deployments and consider the limited computational resources available at the edge. To address this gap, a hierarchical SA optimization is used as an efficient aggregation method at the fog and cloud layers. The method improves performance by prioritizing updates from the better-performing models.,and enhances training efficiency by focusing on early updates. Finally, the GA-based hyperparameter optimization process reduces the computational effort of edge nodes by using only one hyperparameter configuration at a time for training and validation. In this way, we address significant challenges in FL, such as optimizing the communication between edge devices and the fog/cloud to reduce overhead, while maintaining the prediction performance of the global model. This is relevant, especially in the case of households’ energy consumption prediction where the energy data is non-IID and a node with a larger dataset and higher energy profile magnitude shouldn’t necessarily have a greater influence on the global model. Additionally, it’s important to consider, especially in the early stages some prediction models may perform poorly on edge nodes but still contribute positively to the global model.

The remainder of the paper is structured as follows: the Methods section introduces the proposed FL solution for households’ energy production, the Results section details the evaluation and validation results and the Conclusion section summarizes the paper and highlights future works.

Methods

Figure 1 presents the proposed three-layer FL architecture for energy consumption prediction of a set of households, Inline graphic . The edge nodes refer to gateway devices located in buildings, which are used to train local prediction models on the data stored locally. These devices then send updates of their learned models to the upper fog layer. Since households have different energy consumption profiles with varying patterns and amplitudes, their effective grouping into distinct clusters is important for prediction accuracy. In this scope we have used our clustering solution from³², with one change that involves removing the extra features related to peak demand hours, as it plays no role in understanding the time series patterns that we are trying to categorize. Therefore, the fog devices are associated with a cluster Inline graphic , of households , enabling them to contribute to a shared prediction model on the fog layer. The top cloud layer is responsible for efficiently aggregating the fog layer updates into a global prediction model.

Fig. 1 — Layered FL architecture for energy prediction.

A round of communications between the top layer cloud, fog clusters, and each cluster with its households and reverse, represents an iteration. We have considered Inline graphic as the total number of iterations needed to complete the trading of the global federated model. The top layer is responsible for initializing and storing the global weights after each iteration , and a set of hyperparameter configurations of the global model . Also, is a cumulative hyperparameter for the cloud model and Inline graphic is the computed performance of the global model on iteration . Each fog layer cluster, , has a set of hyperparameter configurations from which it selects the best configuration and sends it to the edge layer. The cumulative hyperparameter of the cluster model is denoted as and its performance on iteration Inline graphic as . Additionally, the cluster-associated vector of weights on iteration , is updated by aggregating the weights received from each edge node. Finally, the household edge nodes are responsible for the training and validation of the model. They receive the initial weights and configuration from the fog and update and evaluate their performance considering the current configuration of the hyperparameters. The performance of the updated model on iteration Inline graphic is denoted as . The computed weights on the prosumer node are .

For each cluster Inline graphic the edge nodes are split into train, and validation nodes , such that . We define the learning of the global federated model as a multi-objective optimization problem. On the edge layer, for each training node the objective is to minimize the loss on its training data set , given the weights of the local model Inline graphic and the best hyperparameter configuration sampled from the set of fog configurations. The objective function is expressed as:

On the fog layer, the objective at each cluster is to minimize the sum of the losses computed on both training and validation edge nodes. This involves minimizing the total loss from all household nodes in the cluster by aggregating the weights from edge nodes within the cluster and selecting the optimal hyperparameter configuration for training. The objective function is:

where Inline graphic is the cluster of edge nodes, is the hyperparameter configurations for the cluster and the set of edge models in the cluster.

The cloud layer’s global objective is to minimize the overall loss on all edge nodes by efficiently aggregating the updates received from the fog nodes:

In other words, the optimization problem is to efficiently aggregate the model weights both on fog and cloud layers and to find the best hyperparameter configuration of nodes such that the sum of edge node training and validation losses is minimized.

Federated learning methodology

In Fig. 2 the computational and communication steps involved in the FL process are presented. Firstly, the cloud initializes the weights and the current temperature for the simulated annealing (SA) process as the maximum temperature (1). Also, the fog nodes initialize the population Inline graphic for the GA with chromosomes (2).

The following steps are repeated for Inline graphic iterations to achieve the overall established objective. The current global weights and the current temperature are broadcasted to all the fog nodes (3). The model weights at the fog level from the previous round are updated with the weights received from the cloud (4). The fog nodes randomly select from their connected edge nodes Inline graphic , a node for validation (5). The hyperparameter tunning process consists of updating the population (6.1), and communication with the validation edge node for evaluating the chromosomes. The chromosomes are selected with a probability for evaluation and are sent to the validation edge node together with the aggregated weights from the previous round (6.2). The validation edge node evaluates the hyperparameter configuration represented by the chromosome on the given weights and its validation data (6.3) and sends back to the fog node the fitness score (6.4). The detailed GA as well as the population update process involving offspring generation, removing the worst candidates, and fitness score computation is described in the Hyperparameter Tuning section. The fog nodes select the best chromosome from the population based on fitness score (6.5). The weights and the best-selected chromosome are broadcasted to all the training edge nodes Inline graphic from (7). The edge nodes train the model with the given hyperparameter configuration on its dataset (8) and send to the fog node the updated weights and its performance (9). Using the SA process, the fog node aggregates the received updates (10) and sends the aggregated model weights Inline graphic and performance to the clod (11). Finally, the cloud aggregates the model updates received from fog nodes (12) and the process is repeated for the remaining iterations.

Prediction models aggregation

We have defined a SA³³ based aggregation solution considering the model performance and allowing for a larger exploration space in the early stages. SA searches for the optimal solution by accepting solutions that are worse than the current one with a probability that is higher at the beginning of the process and decreases over time, controlled by a temperature parameter. Therefore, as the federated learning process progresses the probability of considering models with lower accuracy in aggregation decreases.

The models are updated based on local performance and previous participation, specifically how early they provided their solutions, using a cumulative hyperparameter (see Algorithm 1). Nodes that contributed to the global model more promptly are rewarded with a higher weight in the aggregation process. It has as input a set of weights Inline graphic and performances , the cumulative hyperparameter from the previous round , the aggregated model weights and performance from the previous round, as well as the current temperature (line 2).

graphic file with name 41598_2025_96443_Figa_HTML.jpg — **Algorithm 1**: SA Aggregation

The method returns a new set of aggregated weights, the performance of the aggregated model, and the updated cumulative hyperparameter. Firstly, the algorithm computes two factors: Inline graphic based on current the cumulative hyperparameter and based on current temperature (line 5). Afterward, for each set of weights (line 6), the difference between the performances of the previous aggregated model and the current updates is computed, and a random number is selected between 0 and 1 (lines 7–8). If the performance of the updated weights is higher than the previous aggregated model or with a given probability influenced by Inline graphic , , and a constant the model is aggregated (lines 9–11). The ponders of the new weights and the aggregated model is given by the and and the cumulative hyperparameter is updated with Finally, the performance of the aggregated weights is computed as the maximum values between the previous performance of the aggregated model and the performances of all the updated models (line 14). The usage of the Boltzmann constant Inline graphic employs to operate with the Boltzmann probability distribution where the random value is evaluated concerning the chance that the system is found in a state with a difference of performance therefore searching function of temperature for better or random states.

Hyperparameters tuning

The GA³⁴ is used to find the best configuration for the hyperparameters for each fog node corresponding to a cluster. The population is initialized with a set of hyperparameter configurations Inline graphic where is the chromosome of the population:

The genes represent hyperparameters that significantly influence model performance in federated energy prediction tasks. The learning rate Inline graphic , is tuned to find a balance between stable convergence and faster training; the batch size allows for exploring different trade-offs between computational efficiency and capturing complex consumption patterns; the number of epochs ensures flexibility in fitting seasonal and varying consumption behaviours without overfitting; the early stopping patience Inline graphic helps to detect convergence and prevent unnecessary training, accommodating data irregularities; and the number of fine-tuning layers controls how much of the pre-trained model is adapted to local conditions. For population initialization, individuals are randomly generated with each hyperparameter value drawn from its defined range, enabling a broad search space for discovering effective configurations.

The GA-based hyperparameter tuning is defined In Algorithm 2. It receives the current temperature from the SA_AGG, Inline graphic , population of chromosomes , the validation node and the current cluster-level aggregated weights . Firstly, the candidates for crossover are selected as the best two hyperparameter configurations in the population (line 6). The new offsprings andare generated by crossover between the selected candidates (line 7) and it is added to the survivor population (line 8). The Single-Point Crossover is used for offspring generation which involves swapping segments of two parent chromosomes at a random point. As parameters, we have set a probability of 60% for the crossovers meaning that for a given pair of parents, there is a 60% chance that crossover will be applied to produce offspring. If Inline graphic and are the parent chromosomes, andare the offspring chromosomes, is the crossover point and is the length of the chromosomes, then the formula is:

graphic file with name 41598_2025_96443_Figb_HTML.jpg — **Algorithm 1**: GA Hyperparameter Tuning Method

For the mutation process, each gene in the offspring has a 3% probability of being changed to a random value from its domain. After the generation of the offspring, the new population Inline graphic is obtained by replacing the two chromosomes with the lowest fitness scores with the newly generated offspring (lines 8–9). Only some of the chromosomes from the population are selected in the current iteration to be evaluated on the validation edge node (lines 10–15). The probability of a chromosome Inline graphic to be selected is given by a randomly generated value (line 11), its current fitness score , constant and the temperature (line 12). For each selected chromosome in the new population, the randomly chosen validation edge node , receives the current cluster-level aggregated weights and a hyperparameter configuration corresponding to a chromosome to compute the fitness score. The fitness score is determined by computing the loss of fitting the model with the received weights and hyperparameters (line 13). If the chromosome is not selected for evaluation, the previous fitness will be kept. Finally, the algorithm returns the new population (line 16).

Results and discussion

The dataset used for evaluation contains energy consumption readings from over 4000 London households, with a subset of these households participating in a time-of-use demand response program³⁵. The data is recorded at 30-minute intervals between November 2011 and February 2014 and provides insights into energy consumption patterns, tariffs, and responses to price signals. Figure 3 illustrates the hourly average energy consumption of households over the data collection period. Distinct groups of houses can be identified based on their energy usage levels. Additionally, there are significant peaks in energy consumption during the day, primarily occurring early in the morning and in the evening. These peaks can be attributed to the unique consumption patterns of each household.

Fig. 3 — Hourly energy consumption data for each household (recorded daily across the dataset).

The monitored data often exhibit imperfections, such as incompleteness, inconsistency, and inaccuracy, as well as errors, outliers, or missing values. To improve data quality and uncover meaningful relationships within the dataset, a data cleaning process was undertaken before data analysis. Initially, data points with missing or erroneous values were removed to ensure data integrity, resulting in a final sample of 4,438 households for this study.

For solution evaluation, a wide array of features was considered to capture various aspects of energy consumption patterns. These features are categorized into several groups, each contributing uniquely to the predictive power of the federated model. We considered temporal features like the hour of the day, day of the week, and month of the year to capture features that capture daily, weekly, and seasonal patterns in energy consumption. To capture short-term trends and variability, statistical features such as moving averages, rolling mean, and maximum and minimum values were used (see Table 1).

Table 1.

Input features for federated energy prediction model.

Type	Feature	Description
Timestamp	Hour, minute	Extracted from the monitoring date and computing sinus and cosines for their values
	Day, month
	Weekday
Statistical	Rolling mean value	Computed on the consumption values for a time window with sizes 3 and 6
Statistical	Rolling maximum/minimum value

Open in a new tab

Figure 4 presents an overview of the daily energy consumption of households from the dataset. Figure 4 (a) shows the distribution of households in the dataset by their daily energy consumption range. Most households in the dataset have an average daily energy consumption that falls within the interval of 0 to 10 or 10 to 20 kWh/day. The average daily energy consumption is computed for overall households as an hourly average and is illustrated in Fig. 4 (b).

Fig. 4 — Households’ energy profiles analysis: (a) Number of households by daily energy consumption range and (b) Hourly energy consumption.

Figure 5 (a) represents the monthly average energy consumption. The average is computed for overall households, and the seasons are represented with different colours, and it can be noticed that the lowest energy consumption is during the summer months (yellow) and the highest is during the winter (blue). Figure 5 (b) presents a heatmap of the average energy consumption for each day of the week and how it varies based on the month. The colour intensity from the heatmap indicates the value of the energy consumption, from blue (high) to light yellow (low).

Fig. 5 — Statistical features analysis: (a) Overall monthly energy consumption and (b) Day of the week energy consumption by month.

We have clustered the households’ prosumers based on the energy profile features using the methodology presented in³¹. In the process, a normalization procedure was applied using the Min-Max normalization method, which scales all values to a range of 0.0 to 1.0. Specifically, the minimum value of each feature is transformed to 0, the maximum value to 1, and all other values to a decimal between 0 and 1. This normalization step is crucial for mitigating the impact of varying data magnitudes on subsequent clustering analyses, thereby preventing associated biases. The applied data preparation process aims to enhance the robustness of the clustering analyses by normalizing data scales and facilitating the use of distance-based metrics in data exploration. Three clustering algorithms, K-means³⁶, K-medoids³⁷, and Hierarchical clustering³⁸ are applied to segment the data based on the features of each load profile. Determining the optimal number of clusters in clustering analysis is challenging, as it typically cannot be precisely known in advance. Therefore, the various clustering algorithms are tested over a predefined range of clusters, from 2 to 30. This extensive range is systematically explored to determine the most appropriate number of clusters using three evaluation metrics: the Silhouette Score (SIL), the Davies-Bouldin Index (DBI), and the Calinski-Harabasz Index (CHI). Table 2 shows the optimal number of clusters for our case is three. The only exceptions are observed with K-medoids, where the optimal number of clusters is two. However, as discussed in previous studies, K-medoids is not reliable for tasks of this nature. Consequently, its results are excluded from further analysis. On the other hand, the results of K-means and Hierarchical clustering mostly agree, with only minor exceptions. Since K-means achieves higher scores across all evaluation metrics (SIL, DBI, and CHI), the labels selected by this algorithm will be incorporated into the proposed solution for further assessment.

Table 2.

The optimal number of clusters for the clustering algorithms based on three evaluation metrics.

Clustering Algorithm	Evaluation metric
Clustering Algorithm	SIL	DBI	CHI
K-means	3	3	3
HAC	3	3	3
K-medoids	2	2	2

Open in a new tab

Figure 6 illustrates the SIL scores for all clustering algorithms evaluated across the selected range of cluster numbers, offering a clear comparison of their performance. The figure highlights the consistent superiority of K-means, as it achieves the highest SIL scores for most of the tested configurations. This trend underscores the robustness of K-means in identifying well-separated and compact clusters.

Fig. 6 — SIL scores for every clustering algorithm under the selected range.

In Fig. 7a visual depiction of the clustering outcomes derived from our methodology is presented, overlaying the time-series data. Each cluster is represented by a unique color to enhance visual distinction, with its respective median trend line displayed in the same color to emphasize the central tendency within the cluster. The clusters reveal subtle yet meaningful variations in energy consumption patterns, primarily distinguished by the volume of usage, providing insights into the underlying structure of our dataset. More specifically, Cluster 1 exhibits the largest magnitude in daytime peaks, reflecting higher activity levels. Cluster 2 shows a moderate level of energy usage, with peaks smaller than those of Cluster 1, but still pronounced compared to Cluster 0. Despite these differences, all clusters share a common temporal structure influenced by similar daily cycles across the dataset.

The evaluation setup for each layer in the federated architecture is presented in Fig. 8. The edge devices, represented by different versions of Raspberry Pi, are mapped to the corresponding households in the dataset. For each cluster, a fog device (Intel Core I3 and 8GB RAM) was used for the aggregation and hyperparameter tuning process. The edge devices are connected to the fog node that represents the cluster to which the consumer belongs.

We have developed applications for script handling and communication exchanges, each corresponding to a layer of the federated architecture, using Spring Boot 3.2.5 with Java 17⁴⁰. For the dependencies manager we have used Maven and communication among nodes is established using Representational State Transfer (REST) communication. Python 3.12.3⁴¹ and TensorFlow 2.18⁴² are used for building scripts for data and model manipulation. The applications and the scripts run on Docker containers deployed on the federated architecture nodes, providing a virtual environment featuring the following libraries: (i) TensorFlow for managing and creating models, (ii) Pandas⁴³ for reading data from comma-separated values (CSV) files and processing it through feature engineering pipelines, and (iii) Scikit-learn⁴⁴ for scaling tasks. Additionally, Scipy⁴⁵ was used for special functions, such as the Boltzmann constant, while Argparse⁴⁶ handled parsing arguments from the stack. Protobuf⁴⁷ was used for building the image, and Matplotlib⁴⁸ for generating plots. The GA for hyperparameters optimization is implemented using the Java library Jenetics 7.0.0⁴⁹ and it is deployed on the Dockers from the fog nodes. The SA algorithm for models’ aggregation is implemented from scratch and runs on the Dockers from the cloud and fog nodes. For monitoring network traffic and hyperparameters, we used features of the Spring Framework along with a custom caching mechanism to capture the state of the algorithms across iterations. The code of our federated solution is available on GitHub⁵⁰.

The energy prediction model architecture is designed using the Keras library⁵¹ and is constructed with sequential layers, the core layer being the LSTM and using ReLU⁵² as activation function. The input consists of a sequence of 6 features with a sequence length of 48. The first LSTM layer contains 32 units. The value was determined through repeated attempts, correlating their impact on the quality of the predictions. A second LSTM layer with 64 units is then applied. Finally, a Dense layer with 16 units, followed by a final Dense layer with 1 unit to output the predicted value. To update the model’s weights, we used the Adam optimizer⁵³ and Mean Squared Error (MSE) as the loss function.

Figure 9 reports the prediction accuracy of our FL methodology compared with other state of the art methods using the average MSE (Mean Square Error) for households’ energy prediction over several iterations (executed on daily energy profiles from 2013-07-10 to 2013-07-20). For a series of iterations, the performance of the aggregated model at the cloud model was analysed, as well as the execution time and volume of the network transmitted data. Compared with FedAVG³⁹ it can be noticed that the hyperparameter tuning method helps the model converge earlier and, by finding the optimal hyperparameters for training, prevents the spikes of the MSE during iterations.

Fig. 9 — Average prediction accuracy of our federated model compared with state of the art methods.

We have compared the accuracy of our FL energy prediction model for each edge device, representing a household. The results presented in Table 3 show that in average the model outperforms the considered baseline represented by the FedAVG algorithm. Our solution effectively captures patterns in household energy profiles through clustering and hyperparameter tuning. It demonstrates superior performance in scenarios where FedAVG struggles, such as for households with device IDs MAC001198 and MAC000321. By introducing greater variance in the energy prediction data used during training and later in cluster-level cross-validation our model has good generalization features. It achieves similar accuracy with FedAVG minimizing prediction deviations across the rest of the households used in testing.

Table 3.

Prediction accuracy of individual households at the edge.

	Our Federated Model			FedAVG
Edge Device ID	MAE	R ²	RMSE	MAE	R ²	RMSE
MAC000434	0.044188375	0.9792175	0.058825875	0.01947587	0.9962506	0.024245411
MAC004505	0.088838167	0.971611333	0.1101665	0.01595106	0.9980114	0.020075945
MAC001441	0.099584625	0.94392825	0.16032175	0.018966151	0.9900336	0.022822501
MAC002451	0.094873286	0.968307667	0.136992286	0.020804703	0.99762475	0.027247075
MAC001326	0.0564565	0.983215375	0.075719875	0.013383849	0.99841934	0.0175527
MAC004290	0.056729286	0.989875286	0.074915429	0.015129962	0.9926895	0.035435524
MAC002163	0.111804286	0.916483714	0.135642	0.035224594	0.9912712	0.10221271
MAC001198	0.077307667	0.990275167	0.104408333	0.7507679	0.76109415	1.3125988
MAC000321	0.039661333	0.968815	0.048596889	0.19151235	0.8691262	0.68717957

Open in a new tab

The execution time and the network traffic are measured over iterations to have an overview of the costs implied by the integration of the proposed aggregation method and hyperparameter tuning process. Figure 10 shows the execution time for each iteration involving a complete federated energy prediction model update. As many combinations of parameters in the search space need to be evaluated by the genetic heuristics it adds computational overhead. The time depends on how many chromosomes are selected during the GA evolution for validation and how fast the edge nodes respond to the computed performance or updated model. Additionally, the increase in the execution time is due to steps that involve additional communication with the household’s validation nodes inside the same cluster.

Fig. 10 — Execution time for each prediction model update iteration.

The computational complexity of our solution is influenced by the additional complexities brought by the GA for hyperparameters optimization and by the simulating annealing solution for prediction models aggregation. In the case of the GA, the complexity per each cluster Inline graphic is directly influenced by the size of the initial population , the number of iterations and the complexity of the fitness function :

where Inline graphic is the total edge nodes in the cluster , the number of training nodes in the cluster, is the average size per training node (), the dimensionality of the model and ∣Φ∣ is the number of hyperparameter configurations.

The computational cost of SA for model aggregation per cluster Inline graphic depends on the number of edge devices in the cluster, , the complexity of the objective function which is the model aggregation loss:

Despite the additional complexity brought by the GA and SA algorithms the execution time for each model iteration remains within reasonable boundaries feasible for solutions requiring the day ahead energy prediction for energy prosumers. Additionally, the accuracy gains are significant compared to other federated models in state of the art. Its complexity could be managed by selecting and sampling only a subset of edge devices or model parameters to approximate the objective function, reducing the dependence on the number of edge devices per cluster and the prediction model dimensionality.

Figure 11 shows the data transmission overhead brought by our federated solutions for all the layers. The edge and fog quantity of transmitted data is computed as an average across all nodes. The FL methodology proposed has minimum impact on incoming and outgoing traffic among nodes on different architectural layers, which is beneficial when network resources are limited such as the cases of edge nodes in smart grids. In our case, the hyperparameter tuning reduces the size of model updates sent between nodes at edge and fog layers as the GA efficiency parameters such as batch size, learning rate, and update frequency. Therefore, the FL-based solution can scale more effectively across larger energy networks with many households associated with edge devices without overwhelming the data network infrastructure. Additionally, the low network traffic overhead of our solution reduces the energy consumption of edge devices, which is particularly important for important in households where energy management often overlaps with the integration of smart homes into energy grids. GA-based hyperparameter optimization minimizes the communication rounds that are required for accurate households’ energy prediction. This not only optimizes the use of data network resources but also smooths the data transmission patterns between nodes making the data flow in federated prediction model update more stable and manageable. Therefore, our federated energy prediction model converges faster leading to quicker decision-making on edge and fog devices, contributing to the management of microgrids.

Fig. 11 — The volume of network traffic for cloud, fog, and edge: (a) incoming and (b) outgoing.

The best fitness score and the diversity from each fog population are represented in Fig. 12. The fitness score (see Fig. 12a) is computed as the performance of the hyperparameters configuration on the selected validation node. The fitness score for the best chromosome is more stable in the later iterations, as the algorithm progresses. This stability reflects a more refined and accurate prediction model as the FL process converges thus the federated model is reaching an optimal solution across all household’s edge nodes, leading to better energy predictions. The diversity of the population on each fog node (see Fig. 12b) helps prevent premature convergence and ensures a more robust, globally optimal solution for the federated energy prediction model. The diversity varies based on local conditions, such as households’ energy data heterogeneity. However, the clustering of households based on energy profiles and the cross-validation of the model between the edge nodes of the same cluster helps in exploring a wide solution space. Our federated model explores not only individual households’ patterns but also broader trends within the cluster widening the solution space, as the model benefits from both local (individual household) and group (cluster) data patterns. Consequently, different fog nodes can host distinct local populations of chromosomes, representing local solutions to the energy prediction problem.

To benchmark the energy prediction accuracy results of our methodology we have used the FedAVG, FedProx and FedMIME implementation from the Tensorflow Federated framework. FedProx is an extension of FedAvg that incorporates a regularization term to handle heterogeneous client data and improve stability in non-iid settings, whilst FedMIME is a personalized federated learning method. The energy consumption values were scaled using Standard Scaler, the dataset was split into training and testing sets (80%-20%), and the federated model was trained over 10 communication rounds. The metrics were computed on the testing set for each client, using the global model. For FedProx, we set the proximal strength to 0.01 to balance stabilizing updates from heterogeneous data and allowing local model adaptation, and the Yogi client optimizer was used with a learning rate of 0.01. For FedMIME, Yogi optimizer was used for both the base and server optimizers, with learning rates of 0.001 and 0.01, respectively. In Table 4 are presented the average values for those metrics computed over all clients and the statistical improvement of our solution.

Table 4.

Average prediction accuracy of our solution compared with state of the Art aggregation methods.

Metric	FedAVG	FedProx	FedMIME	Our federated model	Prediction Accuracy relative improvement (%)
Metric	FedAVG	FedProx	FedMIME	Our federated model	FedAVG	FedProx	FedMIME
MAE	0.12014	0.10431	0.50415	0.07438	38.08	28.69	85.25
RMSE	0.24993	0.22456	0.69997	0.10062	59.74	55.19	85.62
R2	0.95495	0.96366	0.31188	0.96797	28.91	11.85	95.35

Open in a new tab

Our federated model demonstrates consistent performance improvements over FedAVG, FedProx, and FedMime across all evaluated metrics. Compared to FedAVG, the MAE decreased in average by 38%, RMSE by 59% and R2 metric was improved by 28%. Similarly, the average accuracy improvements over FedProx were of 28% for MAE, 55% for RMSE, and 11% for R2. The prediction performance of FedMIME was worse than FedAVG and FedProx due to its focus on personalization and the relatively small number of training examples (hourly consumption data over less than one year). Thus, the improvement was higher in this case (over 85%).

As a final note, the hierarchical FL methodology and adaptive hyperparameter tuning strategy presented here are not restricted to energy prediction and can be applied in diverse fields characterized by data decentralization and privacy concerns. Examples include distributed healthcare analytics (e.g., hospital-level patient data)^54,55, language modeling^56,57, traffic⁵⁸, and telecommunications⁵⁹ forecasting among others. In each case, grouping similar data sources into clusters and adjusting hyperparameters to local conditions enhances performance, robustness, and scalability. Likewise, the GA-based hyperparameter tuning method is equally domain-agnostic. It can efficiently search large and complex hyperparameter spaces to identify near-optimal configurations without requiring explicit assumptions about the underlying data distribution or the nature of the predictive task. This flexibility makes the proposed approach readily transferable to other fields where FL and hyperparameter optimization are needed.

Conclusions

The proposed hierarchical federated learning solution for household energy prediction, captures well the household energy patterns through clustering and hyperparameter tuning, excelling in scenarios where FedAVG underperforms with an average accuracy improvement of about 20%. It ensures good generalization by introducing greater variance in training and cluster-level cross-validation while achieving comparable accuracy to FedAVG in scenarios where FedAVG excels (around 4%). Additionally, it outperforms FedProx, and FedMIME, with significant gains in prediction accuracy. The network traffic is kept below 30 KB, and hyperparameter tuning reduces model update sizes and communication rounds by 30%, making the approach efficient in resource-constrained networks.

Acknowledgements/Funding

This research received funding from the European Union’s Horizon Europe research and innovation program under Grant Agreements number 101136216 (Hedge-IoT) and 101103998 (DEDALUS). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Climate, Infrastructure, and Environment Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Author contributions

Conceptualization, T.C., I.A., V.M and Ef.S.; Methodology, T.C., L.T., El.S. and V.M.; writing—original draft preparation, L.T., M.D., T.C., I.A., V.M., Ef.S. and El.S.; writing—review and editing, L.T., M.D., T.C., I.A., V.M., Ef.S,, and El.S.; All authors read and agreed to the submitted version of the manuscript.

Data availability

All data generated or analysed during this study are included in this published article .

Declarations

Competing interests

The authors declare no competing interests.

Human ethics and consent to participate declarations

Not applicable.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Tudor Cioara, Email: tudor.cioara@cs.utcluj.ro.

Ionut Anghel, Email: ionut.anghel@cs.utcluj.ro.

References

1.Sarmas, E. et al. Revving up energy autonomy: A forecast-driven framework for reducing reverse power flow in microgrids’, sustain. Energy Grids Netw.38, 101376. 10.1016/j.segan.2024.101376 (Jun. 2024).
2.Aslam, S. et al. A survey on deep learning methods for power load and renewable energy forecasting in smart microgrids’, renew. Sustain. Energy Rev.144, 110992. 10.1016/j.rser.2021.110992 (Jul. 2021).
3.Zhu, J. et al. ‘Review and prospect of data-driven techniques for load forecasting in integrated energy systems’, Appl. Energy, 321, 119269, DOI: 10.1016/j.apenergy.2022.119269.Sep. (2022).
4.Olusogo Popoola, M. et al. A critical literature review of security and privacy in smart home healthcare schemes adopting IoT & blockchain: Problems, challenges and solutions, Blockchain: Research and Applications, Volume 5, Issue 2, 100178, ISSN 2096–7209, (2024). 10.1016/j.bcra.2023.100178
5.Vigurs, C., Maidment, C., Fell, M. & Shipworth, D. Customer privacy concerns as a barrier to sharing data about energy use in smart local energy systems: A rapid realist review. Energies14 (5), 1285 (2021). [Google Scholar]
6.Taïk, A. & Cherkaoui, S. ‘Electrical Load Forecasting Using Edge Computing and Federated Learning’, in ICC 2020–2020 IEEE International Conference on Communications (ICC), Jun. pp. 1–6. (2020). 10.1109/ICC40277.2020.9148937
7.Liu, H., Zhang, X., Shen, X. & Sun, H. ‘A federated learning framework for smart grids: Securing power traces in collaborative learning’, Nov. 01, 2021, arxiv: arxiv:2103.11870. 10.48550/arXiv.2103.11870
8.Li, Q., Diao, Y., Chen, Q. & He, B. ‘Federated Learning on Non-IID Data Silos: An Experimental Study’, in 2022 IEEE 38th International Conference on Data Engineering (ICDE), May pp. 965–978. (2022). 10.1109/ICDE53745.2022.00077
9.Zhu, H., Xu, J., Liu, S. & Jin, Y. ‘Federated learning on non-IID data: A survey’, Neurocomputing, vol. 465, pp. 371–390, Nov. (2021). 10.1016/j.neucom.2021.07.098
10.Savi, M. & Olivadese, F. Short-Term energy consumption forecasting at the edge: A federated learning approach. IEEE Access.9, 95949–95969. 10.1109/ACCESS.2021.3094089 (2021). [Google Scholar]
11.Briggs, C., Fan, Z. & Andras, P. Federated learning for Short-Term residential load forecasting. IEEE Open. Access. J. Power Energy. 9, 573–583. 10.1109/OAJPE.2022.3206220 (2022). [Google Scholar]
12.He, Y., Luo, F., Ranzi, G. & Kong, W. ‘Short-Term Residential Load Forecasting Based on Federated Learning and Load Clustering’, in 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Oct. pp. 77–82. (2021). 10.1109/SmartGridComm51999.2021.9632314
13.Tun, Y. L., Thar, K., Thwal, C. M. & Hong, C. S. ‘Federated Learning based Energy Demand Prediction with Clustered Aggregation’, in IEEE International Conference on Big Data and Smart Computing (BigComp), Jan. 2021, pp. 164–167. (2021). 10.1109/BigComp51126.2021.00039
14.Gholizadeh, N. & Musilek, P. ‘Federated learning with hyperparameter-based clustering for electrical load forecasting’, Internet Things, vol. 17, p. 100470, Mar. (2022). 10.1016/j.iot.2021.100470
15.Fernández, J. D., Menci, S. P., Lee, C. M., Rieger, A. & Fridgen, G. Privacy-preserving federated learning for residential short-term load forecasting. Appl. Energy. 326, 119915. 10.1016/j.apenergy.2022.119915 (Nov. 2022).
16.Duttagupta, A., Zhao, J. & Shreejith, S. ‘Exploring Lightweight Federated Learning for Distributed Load Forecasting’, in 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Oct. pp. 1–6. (2023). 10.1109/SmartGridComm57358.2023.10333889
17.Wang, Y. & Guo, Q. Privacy-Preserving and adaptive federated deep learning for multiparty wind power forecasting. IEEE Trans. Ind. Appl. 1–11. 10.1109/TIA.2024.3430229 (2024).
18.Fekri, M. N., Grolinger, K. & Mir, S. Distributed load forecasting using smart meter data: federated learning with recurrent neural networks. Int. J. Electr. Power Energy Syst.137, 107669. 10.1016/j.ijepes.2021.107669 (May 2022).
19.Hu, Y., Ren, H., Hu, C., Deng, J. & Xie, X. An Element-Wise Weights Aggregation Method for Federated Learning, 2023 IEEE International Conference on Data Mining Workshops (ICDMW), Shanghai, China, 2023, pp. 188–196. 10.1109/ICDMW60847.2023.00031
20.Hu, Z., Shaloudegi, K., Zhang, G. & Yu, Y. Federated Learning Meets Multi-Objective Optimization, in IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2039–2051, 1 July-Aug. (2022). 10.1109/TNSE.2022.3169117
21.Chifu, V., Cioara, T., Anitiei, C., Pop, C. & Anghel, I. ‘FedWOA: A Federated Learning Model that uses the Whale Optimization Algorithm for Renewable Energy Prediction’, Sep. 19, 2023, arXiv: arXiv:2309.10337. 10.48550/arXiv.2309.10337
22.Raiaan, M. A. K., Sakib, S., Fahad, N. M. & Mamun, A. A. Md. Anisur Rahman, Swakkhar Shatabda, Md. Saddam Hossain Mukta, A systematic review of hyperparameter optimization techniques in convolutional neural networks. Decis. Analytics J.11, 2772–6622. 10.1016/j.dajour.2024.100470 (2024). [Google Scholar]
23.Jingwen Zhou, S., Pal, C., Dong, K. & Wang Enhancing quality of service through federated learning in edge-cloud architecture. Ad Hoc Netw.156, 1570–8705. 10.1016/j.adhoc.2024.103430 (2024). [Google Scholar]
24.Kundroo, M. & Kim, T. Federated learning with hyper-parameter optimization. J. King Saud University-Computer Inform. Sci.35 (9), 101740 (2023). [Google Scholar]
25.Qolomany, B., Ahmad, K., Al-Fuqaha, A. & Qadir, J. Particle Swarm Optimized Federated Learning For Industrial IoT and Smart City Services, GLOBECOM 2020–2020 IEEE Global Communications Conference, Taipei, Taiwan, pp. 1–6, (2020). 10.1109/GLOBECOM42002.2020.9322464
26.Fahd, N. et al. Deepak Gupta, pelican optimization algorithm with federated learning driven attack detection model in internet of things environment. Future Generation Comput. Syst.14810.1016/j.future.2023.05.029 (2023). Pages 118–127, ISSN 0167-739X.
27.Połap, D. et al. A heuristic approach to the hyperparameters in training spiking neural networks using spike-timing-dependent plasticity. Neural Comput. Applic. 34, 13187–13200. 10.1007/s00521-021-06824-8 (2022). [Google Scholar]
28.Bukhari, S. M. S., Moosavi, S. K. R., Zafar, M. H., Mansoor, M., Mohyuddin, H., Ullah,S. S., … Sanfilippo, F. (2024). Federated transfer learning with orchard-optimized Conv-SGRU: A novel approach to secure and accurate photovoltaic power forecasting.Renewable Energy Focus, 48, 100520.
29.Vasilis Michalakopoulos, E. et al. A machine learning-based framework for clustering residential electricity load profiles to enhance demand response programs. Appl. Energy. 361, 0306–2619. 10.1016/j.apenergy.2024.122943 (2024). [Google Scholar]
30.Li, J. et al. Federated learning-based short-term Building energy consumption prediction method for solving the data silos problem. Build. Simul. 15, 1145–115910.1007/s12273-021-0871-y (2022). [Google Scholar]
31.Petrangeli, E., Tonellotto, N. & Vallati, C. Performance evaluation of federated learning for residential energy forecasting. IoT3 (3), 381–397 (2022). [Google Scholar]
32.Vasilis Michalakopoulos, E., Sarantinopoulos, E., Sarmas, V. & Marinakis Empowering federated learning techniques for privacy-preserving PV forecasting. Energy Rep.1210.1016/j.egyr.2024.08.033 (2024). Pages 2244–2256, ISSN 2352–4847.
33.Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Sci. New. Ser., 220, No. 4598. (May 13, 1983), pp. 671–680 . [DOI] [PubMed]
34.Man, K. F., Tang, K. S. & Kwong, S. Genetic Algorithms: Concepts and Applications, IEEE Transactions on Industrial Electronics, Vol. 43, No. 5, 519 (October 1996).
35.UK Power Networks. SmartMeter Energy Consumption Data in London Households, https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households
36.David Arthur and Sergei Vassilvitskii. K-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (SODA ‘07). Society for Industrial and Applied Mathematics, USA, 1027–1035. (2007).
37.Leonard Kaufman, Peter, J. & Rousseeuw Finding Groups in Data: An Introduction to Cluster Analysis, ISBN:9780471878766 |Online ISBN:9780470316801 (1990). 10.1002/9780470316801
38.Murtagh, F. & Contreras, P. Algorithms for hierarchical clustering: an overview. WIREs Data Min. Knowl. Discov. 2, 86–97. 10.1002/widm.53 (2012). [Google Scholar]
39.Brendan, H. McMahan Eider Moore Daniel Ramage Seth Hampson Blaise Aguera y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) (2017).
40.Boot, S. https://spring.io/projects/spring-boot
41.Python https://www.python.org/downloads/release/python-3123/
42.Tensorflow https://github.com/tensorflow/tensorflow/releases
43.Pandas https://pandas.pydata.org/.
44.scikit-learn, https://scikit-learn.org/stable/
45.SciPy https://scipy.org/.
46.argparse https://docs.python.org/3/library/argparse.html
47.Protobuf https://protobuf.dev/.
48.Matplotlib https://matplotlib.org/.
49.Jenetics https://jenetics.io/.
50.Heuristic-Based Federated Learning on GitHub. https://github.com/mihaid150/Heuristic-Adaptive-Federated-Learning
51.Keras https://keras.io/.
52.Abien Fred, M. & Agarap Deep Learning using Rectified Linear Units (ReLU), (2018). https://arxiv.org/abs/1803.08375
53.Diederik, P. & Kingma Jimmy Ba, Adam: A Method for Stochastic Optimization, (2014). https://arxiv.org/abs/1412.6980
54.Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inf.112, 59–67. 10.1016/j.ijmedinf.2018.01.007 (Apr. 2018). [DOI] [PMC free article] [PubMed]
55.Choudhury, O. et al. Differential Privacy-enabled Federated Learning for Sensitive Health Data, Feb. 27, 2020, arXiv: arXiv:1910.02578. 10.48550/arXiv.1910.02578
56.McMahan, H. B., Ramage, D., Talwar, K. & Zhang, L. Learning Differentially Private Recurrent Language Models, Feb. 23, 2018, arXiv: arXiv:1710.06963. 10.48550/arXiv.1710.06963
57.Wu, X., Liang, Z. & Wang, J. FedMed: A Federated Learning Framework for Language Modeling, Sensors, vol. 20, no. 14, Art. no. 14, Jan. (2020). 10.3390/s20144048 [DOI] [PMC free article] [PubMed]
58.Liu, Y., Yu, J. J. Q., Kang, J., Niyato, D. & Zhang, S. Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach, IEEE Internet Things J., vol. 7, no. 8, pp. 7751–7763, Aug. (2020). 10.1109/JIOT.2020.2991401
59.Perifanis, V., Pavlidis, N., Koutsiamanis, R. A. & Efraimidis, P. S. Federated learning for 5G base station traffic forecasting. Comput. Netw.235, 109950. 10.1016/j.comnet.2023.109950 (Nov. 2023).
60.Powerpoint, M. https://www.microsoft.com/ro-ro/microsoft-365/powerpoint

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Zhu, J. et al. ‘Review and prospect of data-driven techniques for load forecasting in integrated energy systems’, Appl. Energy, 321, 119269, DOI: 10.1016/j.apenergy.2022.119269.Sep. (2022).

Data Availability Statement

All data generated or analysed during this study are included in this published article .

[CR1] 1.Sarmas, E. et al. Revving up energy autonomy: A forecast-driven framework for reducing reverse power flow in microgrids’, sustain. Energy Grids Netw.38, 101376. 10.1016/j.segan.2024.101376 (Jun. 2024).

[CR2] 2.Aslam, S. et al. A survey on deep learning methods for power load and renewable energy forecasting in smart microgrids’, renew. Sustain. Energy Rev.144, 110992. 10.1016/j.rser.2021.110992 (Jul. 2021).

[CR3] 3.Zhu, J. et al. ‘Review and prospect of data-driven techniques for load forecasting in integrated energy systems’, Appl. Energy, 321, 119269, DOI: 10.1016/j.apenergy.2022.119269.Sep. (2022).

[CR4] 4.Olusogo Popoola, M. et al. A critical literature review of security and privacy in smart home healthcare schemes adopting IoT & blockchain: Problems, challenges and solutions, Blockchain: Research and Applications, Volume 5, Issue 2, 100178, ISSN 2096–7209, (2024). 10.1016/j.bcra.2023.100178

[CR5] 5.Vigurs, C., Maidment, C., Fell, M. & Shipworth, D. Customer privacy concerns as a barrier to sharing data about energy use in smart local energy systems: A rapid realist review. Energies14 (5), 1285 (2021). [Google Scholar]

[CR6] 6.Taïk, A. & Cherkaoui, S. ‘Electrical Load Forecasting Using Edge Computing and Federated Learning’, in ICC 2020–2020 IEEE International Conference on Communications (ICC), Jun. pp. 1–6. (2020). 10.1109/ICC40277.2020.9148937

[CR7] 7.Liu, H., Zhang, X., Shen, X. & Sun, H. ‘A federated learning framework for smart grids: Securing power traces in collaborative learning’, Nov. 01, 2021, arxiv: arxiv:2103.11870. 10.48550/arXiv.2103.11870

[CR8] 8.Li, Q., Diao, Y., Chen, Q. & He, B. ‘Federated Learning on Non-IID Data Silos: An Experimental Study’, in 2022 IEEE 38th International Conference on Data Engineering (ICDE), May pp. 965–978. (2022). 10.1109/ICDE53745.2022.00077

[CR9] 9.Zhu, H., Xu, J., Liu, S. & Jin, Y. ‘Federated learning on non-IID data: A survey’, Neurocomputing, vol. 465, pp. 371–390, Nov. (2021). 10.1016/j.neucom.2021.07.098

[CR10] 10.Savi, M. & Olivadese, F. Short-Term energy consumption forecasting at the edge: A federated learning approach. IEEE Access.9, 95949–95969. 10.1109/ACCESS.2021.3094089 (2021). [Google Scholar]

[CR11] 11.Briggs, C., Fan, Z. & Andras, P. Federated learning for Short-Term residential load forecasting. IEEE Open. Access. J. Power Energy. 9, 573–583. 10.1109/OAJPE.2022.3206220 (2022). [Google Scholar]

[CR12] 12.He, Y., Luo, F., Ranzi, G. & Kong, W. ‘Short-Term Residential Load Forecasting Based on Federated Learning and Load Clustering’, in 2021 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Oct. pp. 77–82. (2021). 10.1109/SmartGridComm51999.2021.9632314

[CR13] 13.Tun, Y. L., Thar, K., Thwal, C. M. & Hong, C. S. ‘Federated Learning based Energy Demand Prediction with Clustered Aggregation’, in IEEE International Conference on Big Data and Smart Computing (BigComp), Jan. 2021, pp. 164–167. (2021). 10.1109/BigComp51126.2021.00039

[CR14] 14.Gholizadeh, N. & Musilek, P. ‘Federated learning with hyperparameter-based clustering for electrical load forecasting’, Internet Things, vol. 17, p. 100470, Mar. (2022). 10.1016/j.iot.2021.100470

[CR15] 15.Fernández, J. D., Menci, S. P., Lee, C. M., Rieger, A. & Fridgen, G. Privacy-preserving federated learning for residential short-term load forecasting. Appl. Energy. 326, 119915. 10.1016/j.apenergy.2022.119915 (Nov. 2022).

[CR16] 16.Duttagupta, A., Zhao, J. & Shreejith, S. ‘Exploring Lightweight Federated Learning for Distributed Load Forecasting’, in 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Oct. pp. 1–6. (2023). 10.1109/SmartGridComm57358.2023.10333889

[CR17] 17.Wang, Y. & Guo, Q. Privacy-Preserving and adaptive federated deep learning for multiparty wind power forecasting. IEEE Trans. Ind. Appl. 1–11. 10.1109/TIA.2024.3430229 (2024).

[CR18] 18.Fekri, M. N., Grolinger, K. & Mir, S. Distributed load forecasting using smart meter data: federated learning with recurrent neural networks. Int. J. Electr. Power Energy Syst.137, 107669. 10.1016/j.ijepes.2021.107669 (May 2022).

[CR19] 19.Hu, Y., Ren, H., Hu, C., Deng, J. & Xie, X. An Element-Wise Weights Aggregation Method for Federated Learning, 2023 IEEE International Conference on Data Mining Workshops (ICDMW), Shanghai, China, 2023, pp. 188–196. 10.1109/ICDMW60847.2023.00031

[CR20] 20.Hu, Z., Shaloudegi, K., Zhang, G. & Yu, Y. Federated Learning Meets Multi-Objective Optimization, in IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2039–2051, 1 July-Aug. (2022). 10.1109/TNSE.2022.3169117

[CR21] 21.Chifu, V., Cioara, T., Anitiei, C., Pop, C. & Anghel, I. ‘FedWOA: A Federated Learning Model that uses the Whale Optimization Algorithm for Renewable Energy Prediction’, Sep. 19, 2023, arXiv: arXiv:2309.10337. 10.48550/arXiv.2309.10337

[CR22] 22.Raiaan, M. A. K., Sakib, S., Fahad, N. M. & Mamun, A. A. Md. Anisur Rahman, Swakkhar Shatabda, Md. Saddam Hossain Mukta, A systematic review of hyperparameter optimization techniques in convolutional neural networks. Decis. Analytics J.11, 2772–6622. 10.1016/j.dajour.2024.100470 (2024). [Google Scholar]

[CR23] 23.Jingwen Zhou, S., Pal, C., Dong, K. & Wang Enhancing quality of service through federated learning in edge-cloud architecture. Ad Hoc Netw.156, 1570–8705. 10.1016/j.adhoc.2024.103430 (2024). [Google Scholar]

[CR24] 24.Kundroo, M. & Kim, T. Federated learning with hyper-parameter optimization. J. King Saud University-Computer Inform. Sci.35 (9), 101740 (2023). [Google Scholar]

[CR25] 25.Qolomany, B., Ahmad, K., Al-Fuqaha, A. & Qadir, J. Particle Swarm Optimized Federated Learning For Industrial IoT and Smart City Services, GLOBECOM 2020–2020 IEEE Global Communications Conference, Taipei, Taiwan, pp. 1–6, (2020). 10.1109/GLOBECOM42002.2020.9322464

[CR26] 26.Fahd, N. et al. Deepak Gupta, pelican optimization algorithm with federated learning driven attack detection model in internet of things environment. Future Generation Comput. Syst.14810.1016/j.future.2023.05.029 (2023). Pages 118–127, ISSN 0167-739X.

[CR27] 27.Połap, D. et al. A heuristic approach to the hyperparameters in training spiking neural networks using spike-timing-dependent plasticity. Neural Comput. Applic. 34, 13187–13200. 10.1007/s00521-021-06824-8 (2022). [Google Scholar]

[CR28] 28.Bukhari, S. M. S., Moosavi, S. K. R., Zafar, M. H., Mansoor, M., Mohyuddin, H., Ullah,S. S., … Sanfilippo, F. (2024). Federated transfer learning with orchard-optimized Conv-SGRU: A novel approach to secure and accurate photovoltaic power forecasting.Renewable Energy Focus, 48, 100520.

[CR29] 29.Vasilis Michalakopoulos, E. et al. A machine learning-based framework for clustering residential electricity load profiles to enhance demand response programs. Appl. Energy. 361, 0306–2619. 10.1016/j.apenergy.2024.122943 (2024). [Google Scholar]

[CR30] 30.Li, J. et al. Federated learning-based short-term Building energy consumption prediction method for solving the data silos problem. Build. Simul. 15, 1145–115910.1007/s12273-021-0871-y (2022). [Google Scholar]

[CR31] 31.Petrangeli, E., Tonellotto, N. & Vallati, C. Performance evaluation of federated learning for residential energy forecasting. IoT3 (3), 381–397 (2022). [Google Scholar]

[CR32] 32.Vasilis Michalakopoulos, E., Sarantinopoulos, E., Sarmas, V. & Marinakis Empowering federated learning techniques for privacy-preserving PV forecasting. Energy Rep.1210.1016/j.egyr.2024.08.033 (2024). Pages 2244–2256, ISSN 2352–4847.

[CR33] 33.Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by simulated annealing. Sci. New. Ser., 220, No. 4598. (May 13, 1983), pp. 671–680 . [DOI] [PubMed]

[CR34] 34.Man, K. F., Tang, K. S. & Kwong, S. Genetic Algorithms: Concepts and Applications, IEEE Transactions on Industrial Electronics, Vol. 43, No. 5, 519 (October 1996).

[CR35] 35.UK Power Networks. SmartMeter Energy Consumption Data in London Households, https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households

[CR36] 36.David Arthur and Sergei Vassilvitskii. K-means++: the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms (SODA ‘07). Society for Industrial and Applied Mathematics, USA, 1027–1035. (2007).

[CR37] 37.Leonard Kaufman, Peter, J. & Rousseeuw Finding Groups in Data: An Introduction to Cluster Analysis, ISBN:9780471878766 |Online ISBN:9780470316801 (1990). 10.1002/9780470316801

[CR38] 38.Murtagh, F. & Contreras, P. Algorithms for hierarchical clustering: an overview. WIREs Data Min. Knowl. Discov. 2, 86–97. 10.1002/widm.53 (2012). [Google Scholar]

[CR39] 39.Brendan, H. McMahan Eider Moore Daniel Ramage Seth Hampson Blaise Aguera y Arcas, Communication-Efficient Learning of Deep Networks from Decentralized Data, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) (2017).

[CR40] 40.Boot, S. https://spring.io/projects/spring-boot

[CR41] 41.Python https://www.python.org/downloads/release/python-3123/

[CR42] 42.Tensorflow https://github.com/tensorflow/tensorflow/releases

[CR43] 43.Pandas https://pandas.pydata.org/.

[CR44] 44.scikit-learn, https://scikit-learn.org/stable/

[CR45] 45.SciPy https://scipy.org/.

[CR46] 46.argparse https://docs.python.org/3/library/argparse.html

[CR47] 47.Protobuf https://protobuf.dev/.

[CR48] 48.Matplotlib https://matplotlib.org/.

[CR49] 49.Jenetics https://jenetics.io/.

[CR50] 50.Heuristic-Based Federated Learning on GitHub. https://github.com/mihaid150/Heuristic-Adaptive-Federated-Learning

[CR51] 51.Keras https://keras.io/.

[CR52] 52.Abien Fred, M. & Agarap Deep Learning using Rectified Linear Units (ReLU), (2018). https://arxiv.org/abs/1803.08375

[CR53] 53.Diederik, P. & Kingma Jimmy Ba, Adam: A Method for Stochastic Optimization, (2014). https://arxiv.org/abs/1412.6980

[CR54] 54.Brisimi, T. S. et al. Federated learning of predictive models from federated electronic health records. Int. J. Med. Inf.112, 59–67. 10.1016/j.ijmedinf.2018.01.007 (Apr. 2018). [DOI] [PMC free article] [PubMed]

[CR55] 55.Choudhury, O. et al. Differential Privacy-enabled Federated Learning for Sensitive Health Data, Feb. 27, 2020, arXiv: arXiv:1910.02578. 10.48550/arXiv.1910.02578

[CR56] 56.McMahan, H. B., Ramage, D., Talwar, K. & Zhang, L. Learning Differentially Private Recurrent Language Models, Feb. 23, 2018, arXiv: arXiv:1710.06963. 10.48550/arXiv.1710.06963

[CR57] 57.Wu, X., Liang, Z. & Wang, J. FedMed: A Federated Learning Framework for Language Modeling, Sensors, vol. 20, no. 14, Art. no. 14, Jan. (2020). 10.3390/s20144048 [DOI] [PMC free article] [PubMed]

[CR58] 58.Liu, Y., Yu, J. J. Q., Kang, J., Niyato, D. & Zhang, S. Privacy-Preserving Traffic Flow Prediction: A Federated Learning Approach, IEEE Internet Things J., vol. 7, no. 8, pp. 7751–7763, Aug. (2020). 10.1109/JIOT.2020.2991401

[CR59] 59.Perifanis, V., Pavlidis, N., Koutsiamanis, R. A. & Efraimidis, P. S. Federated learning for 5G base station traffic forecasting. Comput. Netw.235, 109950. 10.1016/j.comnet.2023.109950 (Nov. 2023).

[CR60] 60.Powerpoint, M. https://www.microsoft.com/ro-ro/microsoft-365/powerpoint

PERMALINK

Heuristic based federated learning with adaptive hyperparameter tuning for households energy prediction

Liana Toderean

Mihai Daian

Tudor Cioara

Ionut Anghel

Vasilis Michalakopoulos

Efstathios Sarantinopoulos

Elissaios Sarmas

Abstract

Introduction

Methods

Fig. 1.

Federated learning methodology

Fig. 2.

Prediction models aggregation

Hyperparameters tuning

Results and discussion

Fig. 3.

Table 1.

Fig. 4.

Fig. 5.

Table 2.

Fig. 6.

Fig. 7.

Fig. 8.

Fig. 9.

Table 3.

Fig. 10.

Fig. 11.

Fig. 12.

Table 4.

Conclusions

Acknowledgements/Funding

Author contributions

Data availability

Declarations

Competing interests

Human ethics and consent to participate declarations

Footnotes

Contributor Information

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases