A machine learning ensemble framework based on a clustering algorithm for improving electric power consumption performance

Taeyong Sim; Sanghyun Ryu; Dongjun Lee; Sujin Lee; Chang-Jae Chun; Hyeonjoon Moon

doi:10.1038/s41598-025-23978-w

. 2025 Nov 17;15:40172. doi: 10.1038/s41598-025-23978-w

A machine learning ensemble framework based on a clustering algorithm for improving electric power consumption performance

Taeyong Sim ^1,^#, Sanghyun Ryu ^1,^#, Dongjun Lee ¹, Sujin Lee ¹, Chang-Jae Chun ¹, Hyeonjoon Moon ^2,^✉

PMCID: PMC12623421 PMID: 41249310

Abstract

Accurate prediction of electric energy consumption is critical for both user convenience and supplier efficiency. This study introduces an ensemble approach that integrates clustering algorithms with machine learning (ML) models to enhance prediction accuracy by identifying consumption patterns within buildings. The research focused on residential apartments in the metropolitan area of Korea, utilizing four evaluation methods (Elbow-Method, Silhouette Score, Calinski-Harabasz Index, and Dunn Index) across five data collection intervals (10 min, 1 h, 1 day, 1 week, and 1 month). Five ML models (CatBoost, Decision Tree, LightGBM, Random Forest, XGBoost) were assessed for their prediction performance across clusters. Additionally, ML models that exhibited high performance within each cluster were amalgamated into an ensemble model to assess the predictive performance regarding total electric energy consumption at the research site. Optimal clustering resulted in two clusters (142 houses for C0, 206 houses for C1) using monthly resampled power data. CatBoost and LightGBM exhibited the highest average prediction performance. Based on the possible combinations of the two models applied to each cluster, four ensemble models were developed: CB-CB, CB-LGBM, LGBM-CB, and LGBM-LGBM. Statistical analysis confirmed that all ensemble models significantly outperformed the control group’s traditional ML approaches without clustering (p < 0.05 or 0.01). The proposed clustering-based ML ensemble model in this study can predict the energy consumed in buildings more accurately by accounting for the unique consumption pattern of each house. It is anticipated to contribute effectively to energy consumption reduction.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-23978-w.

Keywords: Electrical energy, Cluster, Machine learning, Ensemble model, Optimization

Subject terms: Engineering, Environmental sciences, Mathematics and computing

Introduction

Electric power energy refers to the capability of electrical energy to perform work, measured in terms of current per unit time or the electrical energy transferred or converted per unit time¹. This energy is crucial in maintaining residential environments through Heating, Ventilation, and Air Conditioning (HVAC) systems and various household appliances. The electric energy consumed in residential settings represents a significant portion of total energy usage², and this consumption has been rising due to the deployment of electric-based indoor environment systems motivated by the push for decarbonization^3,4. However, unconditional regulations and conservation measures that ignore residents’ electricity needs might compromise the quality of living and working environments^5–9. Therefore, accurate prediction of power energy consumption is necessary, as such predictions can enhance user convenience, support efficient management and operation of energy systems, and contribute to reducing carbon emissions¹⁰.

Power consumption in residential buildings exhibits varied usage patterns based on factors such as weather, surrounding environment, consumer electronics usage patterns, number of residents, and type of residence¹¹. This variability is particularly pronounced in South Korea, where seasonal and periodic characteristics according to temperature metrics by periods are observed^12,13, making electric energy consumption highly sensitive to external conditions such as weather. The continental climate of Korea’s metropolitan areas creates distinct seasonal variations with hot summers and cold winters, leading to significant fluctuations in HVAC system usage throughout the year. While engineering-based methods involving mathematical modeling of buildings¹⁴ are often complex and time-consuming, Artificial Intelligence (AI)-based methods utilize historical and current energy usage data to forecast future needs with greater accuracy, efficiency, and practicality^15,16. Therefore, this study utilizes an AI-based approach to accurately predict electric power consumption for maintaining residential environments in Korean apartment complexes.

Research on artificial intelligence-based methodologies for electric energy usage prediction has been continuously advancing. Various Machine Learning (ML) algorithms have been employed to achieve accurete power consumption predictions, with gradient boosting models showing particularly strong performance. Abumohsen, M. et al.¹⁷ demonstrated that Random Forest models (R² Score = 0.877) outperformed XGBoost (R² Score = 0.811) and linear regression (R² Score = 0.637) for power usage prediction. Yin Z. et al.¹⁸ showed LightGBM achieved superior performance (R² Score = 0.930) compared to other ML algorithms for equipment power consumption. Zhang, L. et al.¹⁹ compared hybrid models derived from CatBoost, LightGBM, and XGBoost for short-term load forecasting, with the XGBoost-AOA model achieving the highest performance (R² Score = 0.922). These studies collectively demonstrate that gradient boosting algorithms consistently achieve high prediction accuracy above 0.90 R² Score in energy consumption forecasting. Furthermore, enhancement methods including optimal input variable construction^20–22, hyperparameter adjustments²³, and transfer learning²³ have been proposed. However, these approaches often require extensive data collection and incur significant time and resource costs while carrying risks of ineffectiveness with incomplete data^24,25.

Analyzing power energy usage by clusters with similar consumption patterns can enhance prediction performance without requiring additional data collection or complex model Development²⁶. This approach addresses the heterogeneity in residential energy consumption by grouping households with similar usage behaviors, enabling more targeted predictions. Han, F. et al.²⁷ introduced a short-term prediction method employing K-Means clustering combined with pooling deep RNN (PDRNN) for residential load forecasting. Their approach achieved MAE of 3.62% and RMSE of 1.66% for 920 customers in Ireland, significantly outperforming traditional models including ARIMA, RNN, SVR, and DRNN. Li, K. et al.²⁸ applied K-Means clustering to forecast short-term power load of buildings, demonstrating that clustering-based ensemble learning improved model generalization. Their PSO-ELM model showed substantial improvement with MAPE reducing from 1.49 to 1.06 after clustering implementation. Similarly, Culaba, A. B. et al.²⁹ classified buildings based on consumption patterns and peaks, achieving 46% reduction in Mean Bias Error (MBE) and 10% reduction in Root Mean Square Error (RMSE) compared to non-clustered methods. These studies utilized various performance metrics including MAE, MAPE, RMSE, and MBE to evaluate prediction accuracy, demonstrating the effectiveness of clustering approaches. However, previous studies did not include systematic processes for quantitatively evaluating optimal clustering parameters, often relying on arbitrary selection of cluster numbers without comprehensive validation. Furthermore, these studies overlooked the potential of leveraging cluster-specific models to improve overall building or complex-wide consumption predictions³⁰. This limitation presents an opportunity to develop a more robust clustering-based ensemble framework that optimizes both clustering parameters and model selection.

This study introduces a novel ensemble framework that systematically integrates optimized clustering with machine learning to predict electric power consumption in Korean residential apartments. Unlike previous studies that lacked systematic optimization or applied single models across all clusters, this framework ensures methodological rigor through quantitative validation while achieving practical improvements in consumption pattern identification and prediction accuracy. The proposed methodology advances existing research through three key innovations: (1) quantitative optimization of clustering parameters using four evaluation metrics (Elbow-Method, Silhouette Score, Calinski-Harabasz Index, and Dunn Index) across multiple time intervals to identify optimal clustering conditions, (2) development of cluster-specific ensemble models by selecting and combining the best-performing ML algorithms (CatBoost, LightGBM, XGBoost) for each cluster rather than applying uniform models, and (3) empirical validation on 348 households in Korea’s metropolitan apartment complexes, addressing the unique energy consumption patterns in regions with distinct seasonal variations.

Method

Environment and proposal framework

The research process, which analyzes the power energy usage pattern of all households in the empirical apartment complexes and predicts power consumption in clustered households and entire complexes, is depicted in Fig. 1. The dataset comprises smart-meter readings for each household at 10-minute intervals, reconstructed into instantaneous-usage series at five aggregation levels (10 min, 1 h, 1 day, 1 week, and 1 month); a quality audit screens records for timestamp gaps and outliers, and a two-stage imputation restores completeness—same-time cross-household averaging for long gaps followed by short-gap linear interpolation—thereby preserving diurnal/seasonal structure and improving clustering stability.

Fig. 1 — Proposed framework in this study.

For clustering, consumption-pattern feature vectors are formed within each candidate resolution and standardized with StandardScaler (mean zero, unit variance) only for the K-Means step, whereas temporal and meteorological variables are appended after clustering for supervised prediction (i.e., K-Means uses no mixed weather–load vectors to avoid cross-domain scale effects). The number of clusters K and the aggregation interval are selected per resolution by fitting K-Means and applying a combined validity–stability protocol—Elbow (first sharp drop in inertia) together with maximization of Silhouette, Calinski–Harabasz, and Dunn indices and a 10-run stability check (cluster-size coefficient of variation < 0.5 with no tiny clusters)—and the resulting (interval, K) setting carries forward to forecasting (“Clustering” in Fig. 1). Predictive performance is then evaluated under a unified, time-aware training procedure: the non-clustered baseline and the cluster-specific predictors (Decision Tree, Random Forest, CatBoost, LightGBM, XGBoost) undergo grid search with rolling-origin (forward-chaining) time-series cross-validation using 10 splits, boosting methods employ early stopping on the validation block, final models are refit on the full training window, and a single evaluation is conducted on the held-out chronological test window. Complex-level demand is obtained by deterministic summation of synchronized cluster forecasts without a second-stage meta-learner (no stacking, blending, or bagging). Performance is compared with the traditional prediction method that does not involve clustering (“Data Analysis (General Method)”) using MAE, MSE, RMSE, and R², with inferential testing by ANOVA with post-hoc comparisons and, where appropriate, independent t-tests.

Data collection and processing

In this study, electric power energy usage data from 348 residential apartment complexes in Republic of Korea were recorded. This dataset was collected by the Korea Institute of Energy Research (KIER) between 23:20 on July 17, 2022 and 15:30 on June 5, 2024. The dataset was provided exclusively for research purposes under institutional data-sharing agreements and is not publicly available. Data were gathered every 10 min from smart meters installed in each household, and the cumulative power energy consumption for each household was formulated as Eq. (1).

In Eq. (1), Inline graphic denotes an individual household within the complex. The Watt-hour [Wh], a measure of power, indicates the electricity consumed by each household from the starting point to the ending point . To predict energy consumption using the ML model, the type of power consumption data was converted from integrated to instantaneous usage. The instantaneous consumption of power energy by each household was calculated using Eq. (2). The instantaneous power consumption Inline graphic of power energy by the houses was determined as the difference between the integrated usage () across the two time points and .

Preprocessing addressed outliers and missing values, and the data were segmented into five aggregation intervals (10 min, 1 h, 1 day, 1 week, and 1 month) to assess the effect of temporal resolution on clustering and forecasting. Outliers typically included records outside the meter’s valid range, timestamps beyond the measurement period, invalid household identifiers, or negative instantaneous values; such records were removed, and completeness was then restored by a two-stage imputation sequence that first replaced long gaps for a given household with the same-time average across other households (thereby preserving seasonal and calendar effects) and subsequently applied linear interpolation to residual short gaps to maintain temporal continuity. This ordering limits instability arising from long gaps—where simple interpolation performs poorly—and preserves diurnal/seasonal structure needed for feature construction and downstream modeling.

Weather data from the Korea Meteorological Administration’s Open MET Data Portal³¹ were used as inputs for supervised prediction. The KMA provides hourly obervations from 105 ground stations nationwide; the candidate set comprised 20 fields(external temperature, ground temperature, dew-point temperature, humidity, rainfall, snowfall, snowfall in the last 3 h, wind speed, wind direction, vapor pressure, local atmospheric pressure, sea-level atmospheric pressure, sunshine, solar radiation, total cloud cover, mid-low cloud cover, cloud shape, visibility, and ground state). Of these, 17 variables were retained, excluding cloud shape, low-level cloud cover, and ground state. The exclusion is justified by a correlation heat-map and principal component analysis conducted over the candidate weather features, which showed that the three descriptors are highly collinear with retained radiative and cloud-amount proxies (sunshine/solar radiation; total and mid–low cloud cover) and contribute negligible unique variance in leading components. Retaining physically direct radiative/thermodynamic drivers (temperature, humidity, pressures, sunshine/solar radiation, cloud amounts, wind) and omitting qualitative or overlapping codes reduces redundancy, avoids encoding heterogeneity, and limits noise propagation in the supervised learners. Table 1 summarizes the input variables (Date, Weather, House ID) and the dependent variable (Electric Power Usage).

Table 1.

Data sets collected from the empirical area by open MET data Portal.

Type	Column	Measurement Range	Unit	Description
Date	YEAR	2022–2024	-	-
	MONTH	1, 2, 3, …, 12	-	-
	DAY	1, 2, 3, …, 31	-	-
	HOUR	1, 2, 3, …, 23	-	-
	MINUTE	1, 2, 3, …, 59	-	-
	DAY_OF_THE_WEEK	Mon, Tue, Wed, …, Sun	-	-
Weather	temp_outdoor	Continuous	℃	Outdoor temperature
	temp_dew_point	Continuous	℃	Dew point temperature
	temp_ground	Continuous	℃	Ground temperature
	humidity	Continuous	%	Humidity
	rainfall	Continuous	mm	Rainfall amount
	snowfall	Continuous	cm	The amount of snowfall
	snowfall_3hr	Continuous	cm	The amount of snowfall in the past three hours
	wind_speed	Continuous	m/s	Wind speed
	wind_direction	0, 1, 2, …, 16	Compass directions spanning 16 points	Wind direction
	pressure_vapor	Continuous	hPa	Vapor atmospheric pressure
	pressure_area	Continuous	hPa	Observatory atmospheric pressure
	pressure_sea	Continuous	hPa	Sea level atmospheric pressure
	sunshine	Continuous	MJ (Mega Joule)	Amount of solar insolation
	solar_radiation	Continuous	h	Incidence of sunlight
	cloud_total	0, 1, 2, …, 10	-	Cloud coverage
	cloud_midlow	0, 1, 2, …, 10	-	Mid-low layer cloud coverage
	visual_range	Continuous	10 m	Visibility range
Residence	HOUSE_ID_BUILDING	0, 1, 2	-	Building in a residential complex
	HOUSE_ID_FLOOR	0, 1, 2, …, 24	-	Building floor index
	HOUSEHOLD_ID	0, 1, 2, …348	-	Unique identifier for each household
Energy Usage	usage_ACCU_h	Continuous	Electricity: kWh	Total energy consumption
Energy Usage	usage_INST_h	Continuous	Electricity: kWh	Real-time energy consumption

Open in a new tab

Method of clustering each household according to the pattern of power energy consumption in apartment complexes in the empirical area

The K-Means clustering technique³² was utilized to cluster the data for each household based on its power consumption pattern within the empirical apartment complex; the algorithm determines the number of clusters according to the hyperparameter K, and updates the cluster centroids to minimize the within-cluster sum of squared Euclidean distances, as expressed by

where Inline graphic denotes the set of households assigned to cluster , its size, x the consumption-pattern feature vector, and the centroid of cluster . Because K-Means relies on Euclidean distances and is therefore scale-sensitive, the feature vectors used for clustering (consumption-pattern vectors constructed within each candidate temporal resolution) were standardized to zero mean and unit variance using a StandardScaler (scikit-learn library), so that distances reflect pattern differences rather than measurement scales; the transform is

Clustering was performed on consumption-pattern features only to avoid cross-domain scale confounding, while temporal and meteorological variables were reserved for the subsequent supervised prediction stage. To ascertain the optimal KKK (number of clusters), the Elbow-method³³, Silhouette Score³⁴, Calinski–Harabasz Index (CHI)³⁵, and Dunn Index³⁶ were employed and compared across a specified range Inline graphic at each of five temporal resolutions (10-min, 1-h, 1-d, 1-w, 1-m); the Elbow analysis identified the first sharp drop in inertia (e.g., inertia falling below of the initial value by ), while the three clustering quality indices were used jointly to maximize separation and cohesion in a manner that mitigates the subjectivity of a single criterion. In addition, clustering stability was examined by performing 10 independent runs per (interval, K) setting and computing the coefficient of variation of cluster sizes, and settings were retained only if the cluster-size coefficient of variation was Inline graphic and no tiny or volatile clusters appeared, after which the optimal conditions were established. Given that Euclidean K-Means is not shift-invariant at sub-daily horizons and can under-represent load-shape or phase differences, the choice of resampling interval was treated as part of the model design and assessed empirically by the multi-metric validity and stability procedure above; weekly/monthly aggregation yielded higher validity scores and stable partitions and thus was retained for downstream modeling, with the acknowledged limitation that sub-daily phase information is attenuated at coarser resolutions.

Clustering was executed under the optimal conditions identified through the envisaged process, and the power energy prediction performance of five ML models was assessed based on the clustering outcomes; the predictive accuracy of each model was evaluated using MAE, MSE, RMSE, and R², through which the clustering conditions demonstrating the best performance were finally established.

Performance evaluation of ML models for prediction on electric power consumption in empirical apartment complexes

ML models for electric power consumption prediction

The power energy consumption and weather data used in this study were preprocessed, subsequently classified under variables such as Date, Weather, and House, and employed as input variables (Table 1), and the instantaneous consumption across clusters and apartment complexes housing multiple families was identified as the target data so that forecasting is one-step-ahead at the same temporal resolution as the input aggregation (10-min → t + 10 min; 1-h → t + 1 h; 1-d → t + 1 day; 1-w → t + 1 week; 1-m → t + 1 month), the datasets were then partitioned chronologically in a 7:3 ratio into Training and Testing sets (no shuffling, the test window being the most recent segment), and the prediction outcomes on the held-out test window, which was excluded from model training, were utilized as performance indicators. The models employed in the predictive analysis included CatBoost(iterations = 500, max_ctr_complexity = 6, random_seed = 10, od_type=’Iter’, od_wait = 25, verbose = 1000, depth = 5, learning_rate = 0.03), Decision Tree (max_depth = 8), LightGBM (n_estimators = 10000, learning_rate = 0.01, verbose = 0), Random Forest (max_depth = 8, min_samples_leaf = 8, min_samples_split = 8, n_estimators = 200), and XGBoost (n_estimators = 1000), and all algorithms were tuned on the training window via rolling-origin (forward-chaining) time-series cross-validation with 10 splits, with boosting methods employing early stopping on the validation block so that the effective number of boosting rounds was governed by validation loss rather than nominal maxima; final models were refit on the full training window and evaluated once on the held-out chronological test window. For ensemble integration, the highest-performing predictor is assigned to each cluster (C0, C1) from the tuned candidate set, and the apartment-complex total at time t is obtained by deterministic additive aggregation of synchronized cluster predictions, Inline graphic , a choice that follows the accounting identity that complex-level demand equals the sum of segment demands and does not introduce any second-stage meta-learner (no stacking, blending, bagging, or weighted voting).

Prediction on electric power consumption

To demonstrate the validity of the proposed methodology, three methods for predicting power energy consumption at the empirical site were implemented. Initially, the electric energy consumption of the apartment complex was predicted without the use of the clustering technique, serving as a control case to examine whether there is an enhancement in performance with the proposed methodology. Subsequently, the predictive performance of each model by cluster was assessed through clustering fitness evaluation. For this comparison, outcomes were measured against three error metrics (MAE, MSE, and RMSE). Ultimately, the electric power consumption of the apartment complex was forecasted using an ensemble model that amalgamated the ML predictive models for each cluster. The predictive results of the Ensemble Model were appraised using four metrics (MAE, MSE, RMSE, and R² Score), and these outcomes were benchmarked against those of the Control Group.

Performance metrics and statistical analysis

In this study, a clustering-based ML ensemble model was proposed to enhance the prediction accuracy of electric power consumption models. To identify the optimal clustering conditions, various factors including the number of clusters and data collection intervals were defined. Performance was evaluated on the held-out chronological test window at each temporal resolution using four metrics (MAE, MSE, RMSE, R²), with symbols defined as follows: Inline graphic is the observed instantaneous power (kW) at test index , the corresponding prediction, n the number of test samples, and the sample mean of .

Analysis of variance (ANOVA) with a post-hoc test was employed in this process. Additionally, the power energy usage predicted by the proposed ensemble model was compared with that of the control group using the independent t-test. Probability values of less than 0.05 or 0.01 were used to statistically analyze the differences in prediction performances. Statistical analyses were conducted using SPSS 15.0 software (SPSS Inc., Chicago, IL, USA). The hardware and software environments used for accessing the database, preprocessing data, and developing and evaluating the model included a workstation with a CPU: 13th Gen Intel^® Core™ i9-13900KS, GPU: NVIDIA Geforce RTX 4090, RAM: Samsung DDR5 32-bit*4 (128GB), and a computing environment based on OS: Windows 10, Python version 3.10.9, and Tensorflow version: 2.10.0.

Results

Data collection and processing

To demonstrate the effectiveness of the methodology in this study, electric power consumption data from 348 household apartment complexes in the metropolitan area of Korea, along with regional weather data, were collected. A total of 33,837,156 rows of energy consumption data and 99,170 rows categorized as instantaneous use data by households were collected. Weather data corresponding to the same collection period for power energy use were recorded. The hourly collected weather data comprised 17 input variables and totaled 16,529 datasets. Outliers in the dataset were identified, with 3,799 instances of negative instantaneous use values across all households deemed invalid measurements and converted to missing values, then removed. In the interpolation process, two methods were implemented to convert missing values into valid ones. The first method involved replacing missing values with the average energy consumption of all households for each respective time, processing a total of 3,051,882 missing values across all households. The second method, linear interpolation, was applied to the remaining 1,372 cases, covering cases where energy usage values across all household data were missing. Figure 2 presents box plots of the processed instantaneous-consumption data at the five resolutions (markers denote mean, median, Q1, Q3), showing a systematic reduction in dispersion as the aggregation window widens—evidence consistent with the use of weekly/monthly windows for stable clustering. Figure 3 shows line plots of the complex-level mean instantaneous consumption at the same resolutions: fine scales reveal diurnal variability and short-lived fluctuations, whereas coarse aggregation suppresses high-frequency noise and makes seasonal structure apparent.

Fig. 2 — Box plots of processed instantaneous consumption by time resolution (10-min, 1-h, 1-d, 1-w, 1-m).

Fig. 3 — Complex-level mean instantaneous consumption at five resolutions (10-min → 1-m); fine scales capture diurnal variability and transients, while coarse scales highlight seasonal structure.

Clustering based on electric power consumption patterns in empirical apartment complexes

In this step, the clustering results of the K-Means Algorithm³² were quantitatively evaluated using the four methods previously mentioned (Elbow-Method, and Comparison on Silhouette Score, CHI, Dunn Index). Figure 4 shows the visualization of Inertia, Silhouette Score, CHI, and Dunn Index for each time interval of the dataset, in the range of Inline graphic .

Fig. 4 — Variation in clustering coefficients across different intervals ((a) Inertia, (b) Silhouette Score, (c) CHI, and (d) Dunn Index). Each graph depicts how the clustering coefficients vary with an increasing number of clusters, segmented into five intervals. The horizontal axis denotes K, while the vertical axis shows the values of the clustering coefficients.

Determination of optimal clustering conditions based on clustering validity assessment

In the Elbow-Method, the first point where inertia decreases by more than 60% as K increases or before the gain diminishes due to increased inertia was identified as the interval in which the optimal K was selected^37,38. When this interval is 10 min, clustering is most effective at Inline graphic and (Fig. 4a). Inertia is measured in 1 M (Million) units and is likewise abbreviated on the graphs for simplicity. A higher Silhouette Score indicates a superior clustering outcome³⁴. When the data collection interval was 10 min or 1 day, the clustering coefficient peaked at , while at other intervals (1 h, 1 week, 1 month), the highest clustering fitness was observed at Inline graphic (Fig. 4b). Similarly, the CHI also identifies K, which indicates a high value, as the optimal K^37,38. For CHI, optimal clustering was achieved when for all time interval conditions (Fig. 4c). Regarding the Dunn Index shown in Fig. 4d, K representing high values is also favored as in the two metrics (Silhouette Score, CHI) mentioned earlier^37,38.

Because Euclidean K-Means is not shift-invariant, sub-daily phase shifts in load shape can blur cluster boundaries at fine resolutions. To address this, five candidate intervals (10-min, 1-h, 1-d, 1-w, 1-m) were compared using the four validity indices together with a 10-run stability screen based on the coefficient of variation (CV) of cluster sizes. Weekly/monthly aggregation produced higher validity scores and more stable partitions (CV Inline graphic , no tiny clusters), and was therefore retained for downstream modeling.

When the clustering was repeated, two clusters under the Inline graphic condition were divided into C0 (small cluster) and C1 (large cluster), and three clusters under the condition were divided into C0 (small cluster), C1 (medium cluster), and C2 (large cluster). As a follow-up, the coefficient of variation (CV) was calculated based on cluster size for each cluster under each condition. CV, a measure of data volatility, indicates higher group volatility with larger values. Values exceeding 0.3 suggest problems with the data or instability in its distribution³⁹. The study determined clustering conditions based on CV when the standard deviation was half of the mean (0.5). As shown in Table 2, when the data time interval was 10 min, 1 h, and 1 day, the CV for C0’s cluster size was 0.5 or greater, indicating unstable variation. Conversely, with data intervals of 1 week and 1 month, the CV was measured at less than 0.5, indicating uniformly formed clusters. Thus, clustering was deemed stable for intervals of 1 week and 1 month, as well as in cases represented by Inline graphic and . Based on these findings, four clustering conditions were selected for model analysis (: interval of 1 week, , : interval of 1 week, , : interval of 1 month, , : interval of 1 month, and ).

Table 2.

Assessments of clustering validity through repeated clustering under specific conditions (5-Intervals, Inline graphic and ).

Interval	K	Simulation Results ()										Mean	SD	CV
Interval	K	1	2	3	4	5	6	7	8	9	10	Mean	SD	CV
10 Minute	2	140	1	139	140	122	120	1	120	120	1	90.400	59.065	0.689
	2	208	347	209	208	226	228	347	228	228	347	257.600	59.065	0.242
	3	31	29	1	1	27	1	1	42	17	28	17.800	14.845	0.879
		154	135	1	138	126	121	138	95	86	157	115.100	43.746	0.401
		163	184	346	209	195	226	209	211	245	163	215.100	50.039	0.245
1 Hour	2	1	124	136	1	124	124	124	124	137	124	101.900	50.682	0.524
	2	347	224	212	347	224	224	224	224	211	224	246.100	50.682	0.217
	3	7	91	1	45	29	33	85	1	47	38	37.700	29.920	0.837
		132	91	132	146	158	157	111	123	124	134	130.800	19.374	0.156
		209	166	215	157	161	158	152	224	177	176	179.500	25.256	0.148
1 Day	2	141	1	141	1	1	111	141	111	141	1	79.000	64.622	0.862
	2	207	347	207	347	347	237	207	237	207	347	269.000	64.622	0.253
	3	87	87	5	63	45	62	4	45	67	1	46.600	31.331	0.709
		94	95	144	112	148	112	134	144	109	143	123.500	20.220	0.173
		167	166	199	173	155	174	210	159	172	204	177.900	18.365	0.109
1 Week	2	142	138	142	138	138	138	138	138	142	142	139.600	1.960	0.015
	2	206	210	206	210	210	210	210	210	206	206	208.400	1.960	0.010
	3	83	63	63	76	83	49	83	64	83	83	73.000	11.688	0.169
		93	122	112	104	93	136	93	112	93	96	105.400	14.158	0.142
		172	163	173	168	172	163	172	172	172	169	169.600	3.611	0.022
1 Month	2	142	142	139	142	142	139	142	139	142	139	140.800	1.470	0.011
	2	206	206	209	206	206	209	206	209	206	209	207.200	1.470	0.008
	3	80	63	52	52	80	80	80	80	80	53	70.000	12.594	0.190
		95	120	127	127	95	95	95	95	95	125	106.900	14.686	0.145
		173	165	169	169	173	173	173	173	173	170	171.100	2.625	0.016

Open in a new tab

SD: Standard Deviation, CV: Coefficient of Variation.

The effects of temporal aggregation on cluster geometry are visualized in Fig. 5 via two-dimensional PCA projections of standardized consumption-pattern features ( Inline graphic and . Separation is visibly tighter and more coherent at 1-week and 1-month, consistent with the quantitative validity/stability results and the final choice of weekly/monthly settings for forecasting (Fig. 6).

Inline graphic — Comparison of Electricity Consumption Clustering by Temporal Resolution and Cluster Count ().

Fig. 6 — Comparison of Electricity Consumption Clustering by Temporal Resolution and Cluster Count ().

Prediction on electric power consumption in empirical apartment complexes by each ML model

Evaluation on forecasting performance for electric power consumption without clustering algorithm (control groups)

The performance of each ML model was evaluated by the four conditions ( Inline graphic , , , ) established in the clustering stage. Following 10 iterations of prediction and evaluation, the performance differences among the control groups of each model were statistically compared (Table 4). The metrics selected to evaluate model performance included MAE, MSE, RMSE, and R² Score. All algorithms (Decision Tree, Random Forest, CatBoost, LightGBM, XGBoost) were tuned by grid search with a unified rolling-origin (forward-chaining) time-series cross-validation of 10 splits on the training window; for boosting models, early stopping on the validation split determined the effective number of boosting rounds (the reported maxima served as upper bounds). The same tuning protocol was applied to the non-clustered baseline and to every per-cluster model, after which final configurations were refit on the full training window and evaluated once on the held-out chronological test window. Hyperparameter search ranges and selected settings are summarized in Table 3.

Table 4.

Performance evaluation of the electric power consumption prediction model for an apartment without clustering (Control Group).

Model	Metric
Model	MAE	MSE	RMSE	R² Score
CatBoost	8.107 ± 0.675	15.285 ± 3.136	10.495 ± 1.379	0.914 ± 0.036
Decision Tree	11.479 ± 2.970	26.473 ± 16.962	15.684 ± 4.331	0.822 ± 0.047
LightGBM	8.016 ± 1.113	14.087 ± 7.071	10.394 ± 2.111	0.926 ± 0.011
Random Forest	9.650 ± 1.907	15.195 ± 7.174	12.063 ± 2.535	0.885 ± 0.028
XGBoost	7.835 ± 1.025	15.289 ± 8.300	11.058 ± 2.525	0.920 ± 0.013
ANOVA
Test

Open in a new tab

Table 3.

Summary of model parameters and grid search ranges used in prediction Experiments.

Models

Fixed Parameters

Range for Grid Search (cv Inline graphic

)

CatBoost

- iterations = 500

- learning_rate = 0.03

- max_ctr_complexity = 6

- random_seed = 10

- depth = 8

- od_type=’Iter’

- od_wait = 25

- verbose = 1000

- iteration: {100, 500, 1000, 1500, 2000}

- learning_rate: {0.01, 0.03, 0.05, 0.07, 0.09}

- random_seed: {2, 4, 6, 8, 10}

- depth: {2, 4, 6, 8, 10}

- l2_leaf_reg: {1, 3, 5, 7, 9}

Decision Tree

- criterion= “gini”

- splitter= “best”

- max_depth = 8

- min_samples_split = 5

- max_depth: {2, 4, 6, 8, 10}

- min_samples_split: {5, 10, 20, 40}

LightGBM

- n_estimators = 10,000

- learning_rate = 0.01

- verbose = 0

- n_estimators: {100, 500, 1000, 5000, 10000}

- learning_rate: {0.01, 0.03, 0.05, 0.07, 0.09}

Random Forest

- n_estimators = 200

- max_depth = 8

- min_samples_leaf = 8

- min_samples_split = 8

- n_estimators: {100, 200, 500}

- max_depth: {6, 8, 10, None}

- min_samples_leaf: {1, 2, 4, 6, 8, 10}

- min_samples_split: {1, 2, 4, 6, 8, 10}

XGBoost

- n_estimators = 1000

- learning_rate = 0.01

- max_depth = 6

- colsample_bytree = 1.0

- n_estimators: {100, 500, 1000, 5000}

- learning_rate: {0.01, 0.05, 0.1}

- colsample_bytree: {0.6, 0.8, 1.0}

Open in a new tab

Based on the results in Table 4, there was no significant difference in performance among the four models (CatBoost, LightGBM, Random Forest, and XGBoost), except for the Decision Tree, which showed lower outcomes as measured by MSE and RMSE. However, when considering the MAE and R² Score, the Gradient Boost models (CatBoost, LightGBM, and XGBoost) exhibited the highest performance, while the Random Forest and Decision Tree models demonstrated relatively poor performance. Consequently, the Decision Tree and Random Forest models were excluded from the control group used to compare the performance of the Ensemble Model.

Performance evaluation on electric power consumption prediction model by each clustering condition

In Tables 5, 6, 7 and 8, the model performance was evaluated under each of the clustering conditions previously selected, and these results were compared to those of the control group ( Inline graphic ) assessed in Sect. "Data collection and processing" (Table 5: MAE; Table 6: MSE; Table 7: RMSE; Table 8: R² Score).

Table 5.

Comparison of the MAE across models implemented under each clustering condition.

Model	Clustering Conditions					Test
Model						Test
CatBoost	8.107 ± 0.675	3.100 ± 0.014	4.336 ± 0.390	4.356 ± 0.029	4.344 ± 0.377
Decision Tree	11.479 ± 2.970	5.126 ± 0.033	7.171 ± 0.046	7.196 ± 0.030	5.161 ± 0.076
LightGBM	8.016 ± 1.113	3.202 ± 0.006	4.481 ± 0.002	4.353 ± 0.325	3.187 ± 0.043
Random Forest	9.650 ± 1.907	4.249 ± 0.026	5.937 ± 0.013	5.936 ± 0.012	4.235 ± 0.081
XGBoost	7.835 ± 1.025	3.596 ± 0.007	5.100 ± 0.003	5.107 ± 0.010	3.618 ± 0.016
Test

Open in a new tab

Table 6.

Comparison of mean squared error for models implemented under each clustering condition.

Model	Clustering Conditions					Test
Model						Test
CatBoost	15.285 ± 3.136	4.521 ± 0.092	2.397 ± 0.050	4.446 ± 0.270	4.460 ± 0.251
Decision Tree	26.473 ± 16.962	12.547 ± 0.063	7.461 ± 0.211	12.374 ± 0.173	7.593 ± 0.233
LightGBM	14.087 ± 7.071	5.164 ± 1.016	3.016 ± 0.037	5.484 ± 0.018	3.069 ± 0.161
Random Forest	15.195 ± 7.174	8.914 ± 0.039	5.059 ± 0.133	8.902 ± 0.045	5.222 ± 0.172
XGBoost	15.289 ± 8.300	6.584 ± 0.139	3.580 ± 0.088	6.544 ± 0.118	3.684 ± 0.063
Test

Open in a new tab

Table 7.

Comparison of RMSE for models implemented under various clustering conditions.

Model	Cluster Conditions					Test
Model						Test
CatBoost	10.495 ± 1.379	5.976 ± 0.040	4.237 ± 0.025	5.949 ± 0.426	5.960 ± 0.407
Decision Tree	15.684 ± 4.331	9.917 ± 0.019	7.078 ± 0.035	9.881 ± 0.052	7.109 ± 0.070
LightGBM	10.394 ± 2.111	6.050 ± 0.466	4.445 ± 0.013	6.227 ± 0.005	4.429 ± 0.058
Random Forest	12.063 ± 2.535	8.107 ± 0.012	5.789 ± 0.027	8.105 ± 0.017	5.778 ± 0.096
XGBoost	11.058 ± 2.525	7.042 ± 0.038	4.957 ± 0.016	7.029 ± 0.029	4.988 ± 0.019
Test

Open in a new tab

Table 8.

Comparison of R² scores for models implemented under various clustering conditions.

Model	Cluster Conditions					Test
Model						Test
CatBoost	0.914 ± 0.036	0.899 ± 0.001	0.917 ± 0.003	0.896 ± 0.015	0.913 ± 0.013
Decision Tree	0.822 ± 0.047	0.771 ± 0.001	0.796 ± 0.005	0.773 ± 0.003	0.780 ± 0.005
LightGBM	0.926 ± 0.011	0.911 ± 0.015	0.918 ± 0.006	0.908 ± 0.000	0.909 ± 0.023
Random Forest	0.885 ± 0.028	0.845 ± 0.000	0.862 ± 0.005	0.845 ± 0.001	0.844 ± 0.013
XGBoost	0.920 ± 0.013	0.884 ± 0.002	0.892 ± 0.002	0.885 ± 0.001	0.865 ± 0.002
Test

Open in a new tab

As indicated in Table 5, it was found that the performance of all models under each cluster condition was superior to that of the control group in terms of MAE. Specifically, in the case of Inline graphic , the lowest MAE was observed in all instances (Table 5). Among them, CatBoost, LightGBM, and XGBoost outperformed other machine learning models with no significant difference noted among them.

As demonstrated in the MSE outcomes of Table 6, the Model by Each Cluster condition specified under K2M achieved the best performance across all models relative to the Control Group. Previously mentioned, the GBM model group (CatBoost, LightGBM, and XGBoost) outperformed the DT model group. Accordingly, CatBoost, LightGBM, and XGBoost recorded higher performance than the other two models as in the case of MAE.

Similar to the previous metrics, the RMSE of the Model by Each Cluster evidenced improved performance over the Control Group in all cases (Table 7). Analogous to other metrics, RMSE of K2M measured the lowest compared to other cases. Additionally, GBM models (CatBoost, LightGBM, and XGBoost) exhibited the best performance across all error metrics.

The R² Score across all clustering conditions revealed a trend differing from the three Error Metrics (MAE, MSE, and RMSE) discussed earlier (Table 8). The performance of the model by each cluster, based on the R² Score, did not surpass that of the Control Group (K0). Under the K2M clustering condition, specific models (CatBoost, Decision Tree, LightGBM) showed no significant difference from K0, while others (Random Forest, XGBoost) displayed lower performance.

Above these comparisons, all experimental groups (modeled by each cluster) demonstrated improved performance relative to the control group for the selected Error Metrics (MAE, MSE, RMSE) and exhibited similar performance for the R² Score. The optimal clustering condition was K2M where data were collected monthly under 2 clusters condition.

Performance evaluation on electric power consumption prediction by ensemble model

An Ensemble model was constructed by integrating the highest-performing predictors from each cluster under the selected condition K2M, and its performance was compared with that of the non-clustered Control Group evaluated on the same set of 348 households. To form candidate ensembles, the per-cluster models were chosen among CatBoost (CB), LightGBM (LGBM), and XGBoost after hyperparameter tuning; deterministic additive aggregation was then applied so that the complex-level load at time t equals the sum of synchronized cluster forecasts, without a second-stage meta-learner. In the comparative analysis, Decision Tree and Random Forest in the Control Group exhibited inferior performance relative to the gradient-boosting models and were excluded from subsequent Control-versus-Ensemble comparisons based on statistical tests (Table 4).

All models operate as one-step-ahead forecasters at the same temporal resolution as the input aggregation (10-min → t + 10 min; 1-h → t + 1 h; 1-d → t + 1 day; 1-w → t + 1 week; 1-m → t + 1 month). A day-ahead (24-hour) profile is visualized by iterating the one-step-ahead hourly model across the next 24 h within the held-out test window in Fig. 7. Forecasts are generated by recursively applying the one-step-ahead hourly model across 24 steps under the chronological. All models reproduce the daily shape (morning ramp, midday dip, evening peak), but the ensembles (e.g., CB–LGBM, LGBM–LGBM) track the observed peak more closely and reduce under/over-shoot around the late-afternoon ramp, indicating better peak-load capture.

Fig. 7 — Performance Evaluation of CB-Based and LGBM-Based Models for 24-Hour Electricity Consumption.

Figure 8 presents the outcomes of the t-test that compares the performances (MAE, MSE, RMSE, and R² Score) of the Control Group and the Ensemble Models. Across the four metrics, the Ensemble configurations showed clear gains over the Control Group. In particular, the CB–CB, CB–LGBM, LGBM–CB, and LGBM–LGBM combinations formed the final ensemble set, each improving MAE, MSE, and RMSE and also raising R² at the complex level. For the K2M setting, the CB–LGBM family achieved the strongest overall performance, with complex-level R² Inline graphic , while maintaining lower absolute-error measures than the non-clustered baseline (see Tables 5, 6, 7 and 8 for MAE/MSE/RMSE/R² comparisons under clustered vs. K0 conditions).

Fig. 8 — Comparison of metrics between control group and ensemble models (a) MAE, (b) MSE, (c) RMSE, and (d) R² Score.

Figure 8 presents the outcomes of the t-test that compares the performances (MAE, MSE, RMSE, and R² Score) of the Control Group and the Ensemble Models. In each chart, the vertical axis represents the average and variance of metrics from the four ML model performances (CatBoost, LightGBM, Random Forest, and XGBoost) within the control group, while the horizontal axis lists each model from both the Ensemble Models and Control Group in ascending order based on the average metrics. Figure 8 illustrates the performance disparities between each model in the Control Group and the Ensemble Model, using symbols to denote significance levels. The symbol ‘#’ denotes non-significant differences, whereas ‘*’ indicates statistically significant differences within the scope of Inline graphic . The statistical comparison reveals that the Ensemble Model significantly enhances performance across four metrics when compared to the ML model of the control group (Fig. 8).

Discussion

Accurate apartment-level electricity demand prediction is hindered by heterogeneous household behavior, pronounced climate-driven variability, and data imperfections that can destabilize downstream clustering and learning. This study presents a practical methodology for analyzing and forecasting consumption in a metropolitan Korean apartment complex by integrating the K-Means clustering algorithm³² with a machine-learning ensemble, grouping households with similar usage profiles and training an optimized model for each cluster. Analyses use two-year time-series of smart-metered power and meteorological covariates. In the metropolitan area of Korea, characterized by a continental climate, seasonal variability and diurnal changes are pronounced⁴⁰, and these climatic characteristics directly shape load patterns. A data-quality audit identified 9.023% of records as outliers or missing. To restore completeness while preserving tractability, same-time average imputation across households was applied first, followed by linear interpolation for residual gaps. This sequence limits instability from long gaps—where simple interpolation performs poorly—and enables consistent feature extraction for clustering and model training. Mean substitution can attenuate extremes, particularly under non-random missingness; however, the operational objective here prioritizes reductions in absolute prediction error for tariffing and system operations. The pre-processed data were reconstructed as instantaneous power series and evaluated at several temporal resolutions. Integrating K-Means with machine-learning forecasters—training an optimized model for each cluster and aggregating predictions—improved error metrics while maintaining ease of implementation and offering a deployable pathway for utilities.

K-Means was adopted to classify household consumption patterns because of its ease of implementation and computational efficiency on large datasets^41–43. To mitigate arbitrariness, the optimal K was selected using a multi-metric protocol—Elbow, Silhouette, Calinski–Harabasz (CHI), and Dunn—applied across five time resolutions^33–36. The Elbow analysis indicated a sharp inertia reduction to Inline graphic at , after which gains stabilized, defining the candidate range³³. Robustness was then checked by ten replications per setting and cluster-size variability (CV); conditions showing instability (e.g., very small clusters) were screened out, yielding four practical settings (K2W, K2M, K3W, K3M) for downstream use. Two design choices increased actionability. First, as K-Means is scale-sensitive, input vectors for clustering were standardized within each candidate resolution, ensuring distance computations reflect pattern differences rather than unit scales. Second, clustering was performed on consumption patterns rather than mixed weather–load features to keep segments interpretable and inclusive; meteorological variables were retained for the prediction stage (not for defining segments), aligning the segmentation with tariff design and DR targeting while avoiding weather-driven confounding in the cluster geometry. These results contrast with prior clustering–forecasting studies that chose larger K or single-criterion rules and then applied uniform predictors post-clustering^26,27. Here, validated, low-complexity segmentation (notably Inline graphic at monthly aggregation) delivered forecasting gains without sacrificing interpretability; importantly, model outputs were integrated by summing cluster-level forecasts to reconstruct complex-level load, rather than by stacking/blending, which simplifies deployment in utility workflows.

The electric energy usage prediction performance under each clustering condition was evaluated against a non-clustered control (K0) tuned by grid search with 10-fold cross-validation, which served as the baseline for assessing gains from the proposed methodology. Basic models (Decision Tree, Random Forest) and gradient-boosting models (CatBoost, LightGBM, XGBoost) were trained for the Control Group, the Model by Each Cluster, and the Ensemble Model, confirming adaptability across algorithm families. According to Table 4, there was no material difference among the three boosting models in MSE/RMSE, whereas MAE and R² revealed clear underperformance of Decision Tree and Random Forest. This outcome reflects the limits of single-tree learners in capturing complex targets and the superior capacity of gradient boosting, which builds on tree bases but controls overfitting and variance^44,45. In particular, CatBoost’s overfitting-prevention mechanisms⁴⁶ and LightGBM’s Gradient-Based One-Side Sampling (GOSS)⁴⁷ were effective for the high-dimensional feature set (~ 370 variables) combining weather and household usage, while XGBoost—despite fast parallel training—showed slightly lower generalization than CatBoost/LightGBM⁴⁸. In Sect. "Performance evaluation on electric power consumption prediction model by each clustering condition", cluster-specific models (K2W, K2M, K3W, K3M) improved MAE, MSE, RMSE relative to K0 across all settings; the largest gains occurred under K2M (MAE 55.552%, MSE 80.091%, RMSE 59.272%), indicating effective grouping of households with similar usage patterns. By contrast, R² for K2M matched K0 for CatBoost/Decision Tree/LightGBM and was slightly lower for Random Forest/XGBoost (Table 5). This divergence is attributed to cluster homogenization: frequent moderate errors are reduced (improving absolute-error metrics) while the system-level variance structure changes little, limiting R² gains. Complex-level forecasts were obtained by summing cluster-level predictions (no stacking, blending, or bagging), a choice that simplifies deployment and preserves the observed error reductions for apartment-level operations.

In heterogeneous customer portfolios, cluster-wise learning with deterministic additive aggregation (C0 + C1 → complex total) offers a general template for high-fidelity yet auditable forecasting. In the present framework, the best predictor is assigned to each cluster (C0, C1) and the resulting forecasts are combined transparently at the total-load level; this preserves interpretability at deployment and aligns with the accounting identity of demand aggregation. Under the K2M condition, the cluster-aware ensemble attained R² Inline graphic (10-fold) at the complex level, indicating that most variance in apartment-level load is recovered once cluster structure is learned and predictions are aggregated. On the same unitless metric, reported single-model baselines in related domains are lower: Abumohsen et al.¹⁷ (utility dataset) R² Inline graphic with RF; Yin et al.¹⁸ (equipment energy) R² with LightGBM; Zhang et al.¹⁹ (short-term load) R² with an XGBoost-AOA hybrid, underscoring task-granularity effects. The CatBoost + LightGBM pairing is preferred on both technical and empirical grounds. CatBoost reduces target leakage via ordered boosting and handles missing/categorical signals robustly, curbing overfitting in rich feature spaces; LightGBM improves sample/feature efficiency through GOSS/EFB, lowering variance without sacrificing bias. These complementary inductive biases are well-matched to the study’s high-dimensional predictors (apartment-level usage plus meteorological covariates), yielding more stable generalization than alternatives that substitute a different tree booster under the same inputs. Importantly, the ensemble’s R² gain relative to cluster-only models is not merely a numerical improvement: it reflects a two-stage effect, (i) within-cluster homogenization that suppresses frequent moderate errors and (ii) aggregation-level variance reconstruction that restores system-level structure lost when clusters are evaluated in isolation.

However, the dataset used in this investigation was acquired from an apartment complex in the metropolitan area of Korea, representing a specific housing type; external validity is therefore bounded to environments with similar characteristics. In addition, variables related to date, day of the week, and weather were major input features; detailed building-physics and social/occupancy attributes (e.g., spatial volume, wall thickness, number of windows, household size and changes due to move-ins/outs) were unavailable. Data completeness was restored via same-time average imputation followed by short-gap linear interpolation, which can attenuate extremes at imputed timestamps. Future work could expand generalizability by considering variations in power-usage patterns across regions (latitude-driven climate differences) and other residential types (single-family homes, multi-family dwellings, row houses, officetels). Acquiring richer building and occupancy data—subject to privacy and access constraints—offers a path to further accuracy gains. Despite these constraints, the proposed methodology demonstrated high predictive accuracy in specific environments, providing a foundation for effective application where similar characteristics hold. Conclusively, the clustering-based ensemble framework maps naturally to utility practice—supporting utility load forecasting and smart-grid optimization through segmentable tariff design, targeted demand-response, and PV/ESS scheduling—while remaining deployable without additional data collection or complex second-stage modeling.

Conclusion

This study proposes an efficient power-demand prediction framework that clusters households by energy-use patterns and integrates the optimal model for each cluster into a cohesive ensemble, delivering accurate, deployment-ready forecasts without additional data collection or complex meta-learning; in the metropolitan-Korea demonstration, the approach reduced errors by 55.6% (MAE), 80.1% (MSE), and 59.3% (RMSE) versus the non-clustered baseline and achieved R² Inline graphic at the complex level. The contribution is to combine quantitative cluster validity with cluster-aware model assignment and transparent (additive) aggregation that reconstructs building-level demand, thereby capturing seasonal and daily fluctuations in continental-climate regions and outperforming traditional predictors that ignore intermediate clustering. In practical terms, the framework provides a usable pathway for smart grids and energy management systems: utilities can employ it for peak-load forecasting, demand-side management, and tariff design, building managers can target HVAC scheduling and efficiency actions by segment, and policymakers can design evidence-based demand-response and incentive programs that promote sustainability, cost reduction, and grid resilience.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1^{(13.1MB, csv)}

Acknowledgements

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) under the metaverse support program to nurture the best talents (IITP-2025-RS-2023-00254529) grant funded by the Korea government(MSIT).

Author contributions

Taeyong Sim and Sanghyun Ryu conceived the study; designed the experiments; collected and preprocessed the data; performed data analysis and predictive modeling; coordinated adjustments based on experimental results; collated the final dataset; and drafted and critically revised the manuscript. Dongjun Lee provided critical feedback through in-depth review of the manuscript; drafted and critically revised the manuscript. Sujin Lee and Changjae Chun provided critical feedback through in-depth review of the manuscript. Hyeonjoon Moon reviewed the revised manuscript and gave final approval for submission to Scientific Reports.

Data availability

The electric usage data that support the findings of this study are available from Korea Institute of Energy Research(KIER) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Korea Institute of Energy Research(KIER). The weather datasets generated and/or analysed during the current study are available in the Korea Meteorological Administration (KMA) Automated Synoptic Observing System(ASOS) repository, https://data.kma.go.kr.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Taeyong Sim and Sanghyun Ryu contributed equally to this work.

References

1.Cetina, Q., Roscoe, R. A. J. & Wright, P. S. Challenges for smart electricity meters due to dynamic power quality conditions of the grid: A review. In 2017 IEEE International Workshop on Applied Measurements for Power Systems (AMPS) 1–6 (IEEE, 2017)
2.Soares, A., Gomes, Á. & Antunes, C. H. Categorization of residential electricity consumption as a basis for the assessment of the impacts of demand response actions. Renew. Sustain. Energy Rev.30, 490–503 (2014). [Google Scholar]
3.Keles, D. & Yilmaz, H. Ü. Decarbonisation through coal phase-out in Germany and Europe—Impact on Emissions, electricity prices and power production. Energy Policy. 141, 111472 (2020). [Google Scholar]
4.Andersen, A. D. & Gulbrandsen, M. The innovation and industry dynamics of technology phase-out in sustainability transitions: insights from diversifying petroleum technology suppliers in Norway. Energy Res. Social Sci.64, 101447 (2020). [Google Scholar]
5.Zheng, G., Li, K. & Wang, Y. The effects of high-temperature weather on human sleep quality and appetite. Int. J. Environ. Res. Public Health. 16 (2), 270 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Darçın, M. Association between air quality and quality of life. Environ. Sci. Pollut. Res.21 (3), 1954–1959 (2014). [DOI] [PubMed] [Google Scholar]
7.Balbus, J. et al. Introduction: Climate Change and Human Health. The Impacts of Climate Change on Human Health in the United States: A Scientific Assessment. 25–42 ( U.S. Global Change Research Program, 2016). 10.7930/J0VX0DFW
8.Sarofim, M. C. et al. Temperature-Related Death and Illness. The Impacts of Climate Change on Human Health in the United States: A Scientific Assessment 43–68 (U.S. Global Change Research Program, 2016). 10.7930/J0MG7MDX
9.Ebi, K. L. et al. Hot weather and heat extremes: health risks. Lancet398 (10301), 698–708 (2021). [DOI] [PubMed] [Google Scholar]
10.Anvari-Moghaddam, A., Guerrero, J. M., Vasquez, J. C., Monsef, H. & Rahimi‐Kian, A. Efficient energy management for a grid‐tied residential microgrid. IET Gener. Transm. Distrib.11(11), 2752–2761 (2017). [Google Scholar]
11.Moore, F. Environmental Control Systems: Heating, Cooling, Lighting (1993).
12.Moon, J., Park, S., Rho, S. & Hwang, E. Robust Building energy consumption forecasting using an online learning approach with R ranger. J. Building Eng. Volume. 47, 2352–7102. 10.1016/j.jobe.2021.103851 (2022). [Google Scholar]
13.Korea Research Institute for Human Settlements. Korea Housing Survey, 2021 (Accessed 31st July 2023) (2021).
14.Wen, L., Zhou, K. & Yang, S. Load demand forecasting of residential buildings using a deep learning model. Electr. Power Syst. Res.179, 0378–7796. 10.1016/j.epsr.2019.106073 (2020). [Google Scholar]
15.Mel Keytingan, M., Shapi, N. A., Ramli, Lilik, J. & Awalin,. Energy consumption prediction by using machine learning for smart building: Case study in Malaysia. Dev. Built Environ.5, 100037. 10.1016/j.dibe.2020.100037 (2021). [Google Scholar]
16.Nivethitha Somu, Gauthama Raman, M. R. & Krithi Ramamritham A hybrid model for Building energy consumption forecasting using long short term memory networks. Appl. Energy. 261, 0306–2619. 10.1016/j.apenergy.2019.114131 (2020). [Google Scholar]
17.Abumohsen, M., Owda, A. Y. & Owda, M. Electrical load forecasting based on random forest, xgboost, and linear regression algorithms. In 2023 International Conference on Information Technology(ICIT) 25–31 (IEEE, 2023)
18.Yin, Z. et al. Pump feature construction and electrical energy consumption prediction based on feature engineering and LightGBM algorithm. Sustainability15 (1), 789 (2023). [Google Scholar]
19.Zhang, L. & Jánošík, D. Enhanced short-term load forecasting with hybrid machine learning models: catboost and XGBoost approaches. Expert Syst. Appl.241, 122686 (2024). [Google Scholar]
20.Kim, D., Yim, T. & Lee, J. Y. Analytical study on changes in domestic hot water use caused by COVID-19 pandemic. Energy231, 120915 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Mardani, A., Liao, H., Nilashi, M., Alrasheedi, M. & Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod.275, 122942 (2020). [Google Scholar]
22.Morteza, A. et al. Deep learning hyperparameter optimization: application to electricity and heat demand prediction for buildings. Energy Build.289, 113036 (2023). [Google Scholar]
23.González-Vidal, A., Mendoza-Bernal, J., Niu, S., Skarmeta, A. F. & Song, H. A transfer learning framework for predictive energy-related scenarios in smart buildings. IEEE Trans. Ind. Appl.59 (1), 26–37 (2022). [Google Scholar]
24.Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science349 (6245), 255–260 (2015). [DOI] [PubMed] [Google Scholar]
25.Kraus, M., Feuerriegel, S. & Oztekin, A. Deep learning in business analytics and operations research: Models, applications and managerial implications. Eur. J. Oper. Res.281 (3), 628–641 (2020). [Google Scholar]
26.Liu, H., Liu, Y., Huang, H., Wu, H. & Huang, Y. Energy consumption dynamic prediction for HVAC systems based on feature clustering deconstruction and model training adaptation. Build. Simul. 17, 1439–1460 (2024).
27.Han, F., Pu, T., Li, M. & Taylor, G. Short-term forecasting of individual residential load based on deep learning and K-means clustering. CSEE J. Power Energy Syst.7 (2), 261–269 (2020). [Google Scholar]
28.Li, K., Zhang, J., Chen, X. & Xue, W. Building’s hourly electrical load prediction based on data clustering and ensemble learning strategy. Energy Build.261, 111943 (2022). [Google Scholar]
29.Culaba, A. B., Rosario, D., Ubando, A. J. R., Chang, J. S. & A. T., & Machine learning-based energy consumption clustering and forecasting for mixed‐use buildings. Int. J. Energy Res.44 (12), 9659–9673 (2020). [Google Scholar]
30.Zhao, Q., Xu, M. & Fränti, P. Sum-of-squares based cluster validity index and significance analysis. In International conference on adaptive and natural computing algorithms 313–322 (Springer Berlin Heidelberg, 2009)
31.Open MET Data Portal. Meteorological data, Retrieved December 1, 2024. from (2015). https://data.kma.go.kr
32.MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics 5, 281–298 (University of California press, 1967)
33.Thorndike, R. L. Who belongs in the family? Psychometrika18 (4), 267–276 (1953). [Google Scholar]
34.Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.20, 53–65 (1987). [Google Scholar]
35.Caliński, T. & Harabasz, J. A dendrite method for cluster analysis. Commun. Statistics-theory Methods. 3 (1), 1–27 (1974). [Google Scholar]
36.Dunn, J. C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters (1973).
37.Bholowalia, P. & Kumar, A. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl.105 (9), 17–24 (2014).
38.Kosowski, P., Kosowska, K. & Janiga, D. Primary energy consumption patterns in selected European countries from 1990 to 2021: a cluster analysis approach. Energies16 (19), 6941 (2023). [Google Scholar]
39.Brown, C. E. Applied Multivariate Statistics in Geohydrology and Related Sciences 155–157 (Springer, Berlin, Heidelberg, 1998). [Google Scholar]
40.Kottek, M., Grieser, J., Beck, C., Rudolf, B. & Rubel, F. World map of the Köppen-Geiger climate classification updated (2006).
41.Pham, D. T., Dimov, S. S. & Nguyen, C. D. Selection of K in K-means clustering. Proc. Inst. Mech. Eng. Part C219(1), 103–119 (2005). [Google Scholar]
42.Bock, H. H. Clustering methods: A history of k-means algorithms. In Selected Contributions in Data Analysis and Classification (eds Brito, P. et al.) 161–172 (Springer, Berlin, Heidelberg, 2007).
43.Wu, J. Advances in K-means clustering: a data mining thinking (Springer Science & Business Media, 2012). [Google Scholar]
44.Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev.54, 1937–1967 (2021). [Google Scholar]
45.Dev, V. A. & Eden, M. R. Gradient boosted decision trees for lithology classification. In Proceedings of the 9th International Conference on Foundations of Computer-Aided Process Design (FOCAPD 2019) (eds Garcia Muñoz, S. et al.), Vol. 47, 113–118 (Elsevier, 2019).
46.Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst.31, 6638–6648 (2018).
47.Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst.30, 3146–3154 (2017).
48.Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 785–794 (2016)

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1^{(13.1MB, csv)}

Data Availability Statement

[CR1] 1.Cetina, Q., Roscoe, R. A. J. & Wright, P. S. Challenges for smart electricity meters due to dynamic power quality conditions of the grid: A review. In 2017 IEEE International Workshop on Applied Measurements for Power Systems (AMPS) 1–6 (IEEE, 2017)

[CR2] 2.Soares, A., Gomes, Á. & Antunes, C. H. Categorization of residential electricity consumption as a basis for the assessment of the impacts of demand response actions. Renew. Sustain. Energy Rev.30, 490–503 (2014). [Google Scholar]

[CR3] 3.Keles, D. & Yilmaz, H. Ü. Decarbonisation through coal phase-out in Germany and Europe—Impact on Emissions, electricity prices and power production. Energy Policy. 141, 111472 (2020). [Google Scholar]

[CR4] 4.Andersen, A. D. & Gulbrandsen, M. The innovation and industry dynamics of technology phase-out in sustainability transitions: insights from diversifying petroleum technology suppliers in Norway. Energy Res. Social Sci.64, 101447 (2020). [Google Scholar]

[CR5] 5.Zheng, G., Li, K. & Wang, Y. The effects of high-temperature weather on human sleep quality and appetite. Int. J. Environ. Res. Public Health. 16 (2), 270 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR6] 6.Darçın, M. Association between air quality and quality of life. Environ. Sci. Pollut. Res.21 (3), 1954–1959 (2014). [DOI] [PubMed] [Google Scholar]

[CR7] 7.Balbus, J. et al. Introduction: Climate Change and Human Health. The Impacts of Climate Change on Human Health in the United States: A Scientific Assessment. 25–42 ( U.S. Global Change Research Program, 2016). 10.7930/J0VX0DFW

[CR8] 8.Sarofim, M. C. et al. Temperature-Related Death and Illness. The Impacts of Climate Change on Human Health in the United States: A Scientific Assessment 43–68 (U.S. Global Change Research Program, 2016). 10.7930/J0MG7MDX

[CR9] 9.Ebi, K. L. et al. Hot weather and heat extremes: health risks. Lancet398 (10301), 698–708 (2021). [DOI] [PubMed] [Google Scholar]

[CR10] 10.Anvari-Moghaddam, A., Guerrero, J. M., Vasquez, J. C., Monsef, H. & Rahimi‐Kian, A. Efficient energy management for a grid‐tied residential microgrid. IET Gener. Transm. Distrib.11(11), 2752–2761 (2017). [Google Scholar]

[CR11] 11.Moore, F. Environmental Control Systems: Heating, Cooling, Lighting (1993).

[CR12] 12.Moon, J., Park, S., Rho, S. & Hwang, E. Robust Building energy consumption forecasting using an online learning approach with R ranger. J. Building Eng. Volume. 47, 2352–7102. 10.1016/j.jobe.2021.103851 (2022). [Google Scholar]

[CR13] 13.Korea Research Institute for Human Settlements. Korea Housing Survey, 2021 (Accessed 31st July 2023) (2021).

[CR14] 14.Wen, L., Zhou, K. & Yang, S. Load demand forecasting of residential buildings using a deep learning model. Electr. Power Syst. Res.179, 0378–7796. 10.1016/j.epsr.2019.106073 (2020). [Google Scholar]

[CR15] 15.Mel Keytingan, M., Shapi, N. A., Ramli, Lilik, J. & Awalin,. Energy consumption prediction by using machine learning for smart building: Case study in Malaysia. Dev. Built Environ.5, 100037. 10.1016/j.dibe.2020.100037 (2021). [Google Scholar]

[CR16] 16.Nivethitha Somu, Gauthama Raman, M. R. & Krithi Ramamritham A hybrid model for Building energy consumption forecasting using long short term memory networks. Appl. Energy. 261, 0306–2619. 10.1016/j.apenergy.2019.114131 (2020). [Google Scholar]

[CR17] 17.Abumohsen, M., Owda, A. Y. & Owda, M. Electrical load forecasting based on random forest, xgboost, and linear regression algorithms. In 2023 International Conference on Information Technology(ICIT) 25–31 (IEEE, 2023)

[CR18] 18.Yin, Z. et al. Pump feature construction and electrical energy consumption prediction based on feature engineering and LightGBM algorithm. Sustainability15 (1), 789 (2023). [Google Scholar]

[CR19] 19.Zhang, L. & Jánošík, D. Enhanced short-term load forecasting with hybrid machine learning models: catboost and XGBoost approaches. Expert Syst. Appl.241, 122686 (2024). [Google Scholar]

[CR20] 20.Kim, D., Yim, T. & Lee, J. Y. Analytical study on changes in domestic hot water use caused by COVID-19 pandemic. Energy231, 120915 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Mardani, A., Liao, H., Nilashi, M., Alrasheedi, M. & Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod.275, 122942 (2020). [Google Scholar]

[CR22] 22.Morteza, A. et al. Deep learning hyperparameter optimization: application to electricity and heat demand prediction for buildings. Energy Build.289, 113036 (2023). [Google Scholar]

[CR23] 23.González-Vidal, A., Mendoza-Bernal, J., Niu, S., Skarmeta, A. F. & Song, H. A transfer learning framework for predictive energy-related scenarios in smart buildings. IEEE Trans. Ind. Appl.59 (1), 26–37 (2022). [Google Scholar]

[CR24] 24.Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science349 (6245), 255–260 (2015). [DOI] [PubMed] [Google Scholar]

[CR25] 25.Kraus, M., Feuerriegel, S. & Oztekin, A. Deep learning in business analytics and operations research: Models, applications and managerial implications. Eur. J. Oper. Res.281 (3), 628–641 (2020). [Google Scholar]

[CR26] 26.Liu, H., Liu, Y., Huang, H., Wu, H. & Huang, Y. Energy consumption dynamic prediction for HVAC systems based on feature clustering deconstruction and model training adaptation. Build. Simul. 17, 1439–1460 (2024).

[CR27] 27.Han, F., Pu, T., Li, M. & Taylor, G. Short-term forecasting of individual residential load based on deep learning and K-means clustering. CSEE J. Power Energy Syst.7 (2), 261–269 (2020). [Google Scholar]

[CR28] 28.Li, K., Zhang, J., Chen, X. & Xue, W. Building’s hourly electrical load prediction based on data clustering and ensemble learning strategy. Energy Build.261, 111943 (2022). [Google Scholar]

[CR29] 29.Culaba, A. B., Rosario, D., Ubando, A. J. R., Chang, J. S. & A. T., & Machine learning-based energy consumption clustering and forecasting for mixed‐use buildings. Int. J. Energy Res.44 (12), 9659–9673 (2020). [Google Scholar]

[CR30] 30.Zhao, Q., Xu, M. & Fränti, P. Sum-of-squares based cluster validity index and significance analysis. In International conference on adaptive and natural computing algorithms 313–322 (Springer Berlin Heidelberg, 2009)

[CR31] 31.Open MET Data Portal. Meteorological data, Retrieved December 1, 2024. from (2015). https://data.kma.go.kr

[CR32] 32.MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics 5, 281–298 (University of California press, 1967)

[CR33] 33.Thorndike, R. L. Who belongs in the family? Psychometrika18 (4), 267–276 (1953). [Google Scholar]

[CR34] 34.Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.20, 53–65 (1987). [Google Scholar]

[CR35] 35.Caliński, T. & Harabasz, J. A dendrite method for cluster analysis. Commun. Statistics-theory Methods. 3 (1), 1–27 (1974). [Google Scholar]

[CR36] 36.Dunn, J. C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters (1973).

[CR37] 37.Bholowalia, P. & Kumar, A. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl.105 (9), 17–24 (2014).

[CR38] 38.Kosowski, P., Kosowska, K. & Janiga, D. Primary energy consumption patterns in selected European countries from 1990 to 2021: a cluster analysis approach. Energies16 (19), 6941 (2023). [Google Scholar]

[CR39] 39.Brown, C. E. Applied Multivariate Statistics in Geohydrology and Related Sciences 155–157 (Springer, Berlin, Heidelberg, 1998). [Google Scholar]

[CR40] 40.Kottek, M., Grieser, J., Beck, C., Rudolf, B. & Rubel, F. World map of the Köppen-Geiger climate classification updated (2006).

[CR41] 41.Pham, D. T., Dimov, S. S. & Nguyen, C. D. Selection of K in K-means clustering. Proc. Inst. Mech. Eng. Part C219(1), 103–119 (2005). [Google Scholar]

[CR42] 42.Bock, H. H. Clustering methods: A history of k-means algorithms. In Selected Contributions in Data Analysis and Classification (eds Brito, P. et al.) 161–172 (Springer, Berlin, Heidelberg, 2007).

[CR43] 43.Wu, J. Advances in K-means clustering: a data mining thinking (Springer Science & Business Media, 2012). [Google Scholar]

[CR44] 44.Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev.54, 1937–1967 (2021). [Google Scholar]

[CR45] 45.Dev, V. A. & Eden, M. R. Gradient boosted decision trees for lithology classification. In Proceedings of the 9th International Conference on Foundations of Computer-Aided Process Design (FOCAPD 2019) (eds Garcia Muñoz, S. et al.), Vol. 47, 113–118 (Elsevier, 2019).

[CR46] 46.Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst.31, 6638–6648 (2018).

[CR47] 47.Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst.30, 3146–3154 (2017).

[CR48] 48.Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 785–794 (2016)

PERMALINK

A machine learning ensemble framework based on a clustering algorithm for improving electric power consumption performance

Taeyong Sim

Sanghyun Ryu

Dongjun Lee

Sujin Lee

Chang-Jae Chun

Hyeonjoon Moon

Abstract

Supplementary Information

Introduction

Method

Environment and proposal framework

Fig. 1.

Data collection and processing

Table 1.

Method of clustering each household according to the pattern of power energy consumption in apartment complexes in the empirical area

Performance evaluation of ML models for prediction on electric power consumption in empirical apartment complexes

ML models for electric power consumption prediction

Prediction on electric power consumption

Performance metrics and statistical analysis

Results

Data collection and processing

Fig. 2.

Fig. 3.

Clustering based on electric power consumption patterns in empirical apartment complexes

Fig. 4.

Determination of optimal clustering conditions based on clustering validity assessment

Table 2.

Fig. 5.

Fig. 6.

Prediction on electric power consumption in empirical apartment complexes by each ML model

Evaluation on forecasting performance for electric power consumption without clustering algorithm (control groups)

Table 4.

Table 3.

Performance evaluation on electric power consumption prediction model by each clustering condition

Table 5.

Table 6.

Table 7.

Table 8.

Performance evaluation on electric power consumption prediction by ensemble model

Fig. 7.

Fig. 8.

Discussion

Conclusion

Supplementary Information

Acknowledgements

Author contributions

Data availability

Declarations

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases