Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Nov 17;15:40172. doi: 10.1038/s41598-025-23978-w

A machine learning ensemble framework based on a clustering algorithm for improving electric power consumption performance

Taeyong Sim 1,#, Sanghyun Ryu 1,#, Dongjun Lee 1, Sujin Lee 1, Chang-Jae Chun 1, Hyeonjoon Moon 2,
PMCID: PMC12623421  PMID: 41249310

Abstract

Accurate prediction of electric energy consumption is critical for both user convenience and supplier efficiency. This study introduces an ensemble approach that integrates clustering algorithms with machine learning (ML) models to enhance prediction accuracy by identifying consumption patterns within buildings. The research focused on residential apartments in the metropolitan area of Korea, utilizing four evaluation methods (Elbow-Method, Silhouette Score, Calinski-Harabasz Index, and Dunn Index) across five data collection intervals (10 min, 1 h, 1 day, 1 week, and 1 month). Five ML models (CatBoost, Decision Tree, LightGBM, Random Forest, XGBoost) were assessed for their prediction performance across clusters. Additionally, ML models that exhibited high performance within each cluster were amalgamated into an ensemble model to assess the predictive performance regarding total electric energy consumption at the research site. Optimal clustering resulted in two clusters (142 houses for C0, 206 houses for C1) using monthly resampled power data. CatBoost and LightGBM exhibited the highest average prediction performance. Based on the possible combinations of the two models applied to each cluster, four ensemble models were developed: CB-CB, CB-LGBM, LGBM-CB, and LGBM-LGBM. Statistical analysis confirmed that all ensemble models significantly outperformed the control group’s traditional ML approaches without clustering (p < 0.05 or 0.01). The proposed clustering-based ML ensemble model in this study can predict the energy consumed in buildings more accurately by accounting for the unique consumption pattern of each house. It is anticipated to contribute effectively to energy consumption reduction.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-23978-w.

Keywords: Electrical energy, Cluster, Machine learning, Ensemble model, Optimization

Subject terms: Engineering, Environmental sciences, Mathematics and computing

Introduction

Electric power energy refers to the capability of electrical energy to perform work, measured in terms of current per unit time or the electrical energy transferred or converted per unit time1. This energy is crucial in maintaining residential environments through Heating, Ventilation, and Air Conditioning (HVAC) systems and various household appliances. The electric energy consumed in residential settings represents a significant portion of total energy usage2, and this consumption has been rising due to the deployment of electric-based indoor environment systems motivated by the push for decarbonization3,4. However, unconditional regulations and conservation measures that ignore residents’ electricity needs might compromise the quality of living and working environments59. Therefore, accurate prediction of power energy consumption is necessary, as such predictions can enhance user convenience, support efficient management and operation of energy systems, and contribute to reducing carbon emissions10.

Power consumption in residential buildings exhibits varied usage patterns based on factors such as weather, surrounding environment, consumer electronics usage patterns, number of residents, and type of residence11. This variability is particularly pronounced in South Korea, where seasonal and periodic characteristics according to temperature metrics by periods are observed12,13, making electric energy consumption highly sensitive to external conditions such as weather. The continental climate of Korea’s metropolitan areas creates distinct seasonal variations with hot summers and cold winters, leading to significant fluctuations in HVAC system usage throughout the year. While engineering-based methods involving mathematical modeling of buildings14 are often complex and time-consuming, Artificial Intelligence (AI)-based methods utilize historical and current energy usage data to forecast future needs with greater accuracy, efficiency, and practicality15,16. Therefore, this study utilizes an AI-based approach to accurately predict electric power consumption for maintaining residential environments in Korean apartment complexes.

Research on artificial intelligence-based methodologies for electric energy usage prediction has been continuously advancing. Various Machine Learning (ML) algorithms have been employed to achieve accurete power consumption predictions, with gradient boosting models showing particularly strong performance. Abumohsen, M. et al.17 demonstrated that Random Forest models (R² Score = 0.877) outperformed XGBoost (R² Score = 0.811) and linear regression (R² Score = 0.637) for power usage prediction. Yin Z. et al.18 showed LightGBM achieved superior performance (R² Score = 0.930) compared to other ML algorithms for equipment power consumption. Zhang, L. et al.19 compared hybrid models derived from CatBoost, LightGBM, and XGBoost for short-term load forecasting, with the XGBoost-AOA model achieving the highest performance (R² Score = 0.922). These studies collectively demonstrate that gradient boosting algorithms consistently achieve high prediction accuracy above 0.90 R² Score in energy consumption forecasting. Furthermore, enhancement methods including optimal input variable construction2022, hyperparameter adjustments23, and transfer learning23 have been proposed. However, these approaches often require extensive data collection and incur significant time and resource costs while carrying risks of ineffectiveness with incomplete data24,25.

Analyzing power energy usage by clusters with similar consumption patterns can enhance prediction performance without requiring additional data collection or complex model Development26. This approach addresses the heterogeneity in residential energy consumption by grouping households with similar usage behaviors, enabling more targeted predictions. Han, F. et al.27 introduced a short-term prediction method employing K-Means clustering combined with pooling deep RNN (PDRNN) for residential load forecasting. Their approach achieved MAE of 3.62% and RMSE of 1.66% for 920 customers in Ireland, significantly outperforming traditional models including ARIMA, RNN, SVR, and DRNN. Li, K. et al.28 applied K-Means clustering to forecast short-term power load of buildings, demonstrating that clustering-based ensemble learning improved model generalization. Their PSO-ELM model showed substantial improvement with MAPE reducing from 1.49 to 1.06 after clustering implementation. Similarly, Culaba, A. B. et al.29 classified buildings based on consumption patterns and peaks, achieving 46% reduction in Mean Bias Error (MBE) and 10% reduction in Root Mean Square Error (RMSE) compared to non-clustered methods. These studies utilized various performance metrics including MAE, MAPE, RMSE, and MBE to evaluate prediction accuracy, demonstrating the effectiveness of clustering approaches. However, previous studies did not include systematic processes for quantitatively evaluating optimal clustering parameters, often relying on arbitrary selection of cluster numbers without comprehensive validation. Furthermore, these studies overlooked the potential of leveraging cluster-specific models to improve overall building or complex-wide consumption predictions30. This limitation presents an opportunity to develop a more robust clustering-based ensemble framework that optimizes both clustering parameters and model selection.

This study introduces a novel ensemble framework that systematically integrates optimized clustering with machine learning to predict electric power consumption in Korean residential apartments. Unlike previous studies that lacked systematic optimization or applied single models across all clusters, this framework ensures methodological rigor through quantitative validation while achieving practical improvements in consumption pattern identification and prediction accuracy. The proposed methodology advances existing research through three key innovations: (1) quantitative optimization of clustering parameters using four evaluation metrics (Elbow-Method, Silhouette Score, Calinski-Harabasz Index, and Dunn Index) across multiple time intervals to identify optimal clustering conditions, (2) development of cluster-specific ensemble models by selecting and combining the best-performing ML algorithms (CatBoost, LightGBM, XGBoost) for each cluster rather than applying uniform models, and (3) empirical validation on 348 households in Korea’s metropolitan apartment complexes, addressing the unique energy consumption patterns in regions with distinct seasonal variations.

Method

Environment and proposal framework

The research process, which analyzes the power energy usage pattern of all households in the empirical apartment complexes and predicts power consumption in clustered households and entire complexes, is depicted in Fig. 1. The dataset comprises smart-meter readings for each household at 10-minute intervals, reconstructed into instantaneous-usage series at five aggregation levels (10 min, 1 h, 1 day, 1 week, and 1 month); a quality audit screens records for timestamp gaps and outliers, and a two-stage imputation restores completeness—same-time cross-household averaging for long gaps followed by short-gap linear interpolation—thereby preserving diurnal/seasonal structure and improving clustering stability.

Fig. 1.

Fig. 1

Proposed framework in this study.

For clustering, consumption-pattern feature vectors are formed within each candidate resolution and standardized with StandardScaler (mean zero, unit variance) only for the K-Means step, whereas temporal and meteorological variables are appended after clustering for supervised prediction (i.e., K-Means uses no mixed weather–load vectors to avoid cross-domain scale effects). The number of clusters K and the aggregation interval are selected per resolution by fitting K-Means and applying a combined validity–stability protocol—Elbow (first sharp drop in inertia) together with maximization of Silhouette, Calinski–Harabasz, and Dunn indices and a 10-run stability check (cluster-size coefficient of variation < 0.5 with no tiny clusters)—and the resulting (interval, K) setting carries forward to forecasting (“Clustering” in Fig. 1). Predictive performance is then evaluated under a unified, time-aware training procedure: the non-clustered baseline and the cluster-specific predictors (Decision Tree, Random Forest, CatBoost, LightGBM, XGBoost) undergo grid search with rolling-origin (forward-chaining) time-series cross-validation using 10 splits, boosting methods employ early stopping on the validation block, final models are refit on the full training window, and a single evaluation is conducted on the held-out chronological test window. Complex-level demand is obtained by deterministic summation of synchronized cluster forecasts without a second-stage meta-learner (no stacking, blending, or bagging). Performance is compared with the traditional prediction method that does not involve clustering (“Data Analysis (General Method)”) using MAE, MSE, RMSE, and R2, with inferential testing by ANOVA with post-hoc comparisons and, where appropriate, independent t-tests.

Data collection and processing

In this study, electric power energy usage data from 348 residential apartment complexes in Republic of Korea were recorded. This dataset was collected by the Korea Institute of Energy Research (KIER) between 23:20 on July 17, 2022 and 15:30 on June 5, 2024. The dataset was provided exclusively for research purposes under institutional data-sharing agreements and is not publicly available. Data were gathered every 10 min from smart meters installed in each household, and the cumulative power energy consumption for each household was formulated as Eq. (1).

graphic file with name d33e358.gif 1

In Eq. (1), Inline graphic denotes an individual household within the complex. The Watt-hour [Wh], a measure of power, indicates the electricity consumed by each household from the starting point Inline graphic to the ending point Inline graphic. To predict energy consumption using the ML model, the type of power consumption data was converted from integrated to instantaneous usage. The instantaneous consumption of power energy by each household was calculated using Eq. (2). The instantaneous power consumption Inline graphic of power energy by the houses was determined as the difference between the integrated usage (Inline graphic) across the two time points Inline graphic and Inline graphic.

graphic file with name d33e415.gif 2

Preprocessing addressed outliers and missing values, and the data were segmented into five aggregation intervals (10 min, 1 h, 1 day, 1 week, and 1 month) to assess the effect of temporal resolution on clustering and forecasting. Outliers typically included records outside the meter’s valid range, timestamps beyond the measurement period, invalid household identifiers, or negative instantaneous values; such records were removed, and completeness was then restored by a two-stage imputation sequence that first replaced long gaps for a given household with the same-time average across other households (thereby preserving seasonal and calendar effects) and subsequently applied linear interpolation to residual short gaps to maintain temporal continuity. This ordering limits instability arising from long gaps—where simple interpolation performs poorly—and preserves diurnal/seasonal structure needed for feature construction and downstream modeling.

Weather data from the Korea Meteorological Administration’s Open MET Data Portal31 were used as inputs for supervised prediction. The KMA provides hourly obervations from 105 ground stations nationwide; the candidate set comprised 20 fields(external temperature, ground temperature, dew-point temperature, humidity, rainfall, snowfall, snowfall in the last 3 h, wind speed, wind direction, vapor pressure, local atmospheric pressure, sea-level atmospheric pressure, sunshine, solar radiation, total cloud cover, mid-low cloud cover, cloud shape, visibility, and ground state). Of these, 17 variables were retained, excluding cloud shape, low-level cloud cover, and ground state. The exclusion is justified by a correlation heat-map and principal component analysis conducted over the candidate weather features, which showed that the three descriptors are highly collinear with retained radiative and cloud-amount proxies (sunshine/solar radiation; total and mid–low cloud cover) and contribute negligible unique variance in leading components. Retaining physically direct radiative/thermodynamic drivers (temperature, humidity, pressures, sunshine/solar radiation, cloud amounts, wind) and omitting qualitative or overlapping codes reduces redundancy, avoids encoding heterogeneity, and limits noise propagation in the supervised learners. Table 1 summarizes the input variables (Date, Weather, House ID) and the dependent variable (Electric Power Usage).

Table 1.

Data sets collected from the empirical area by open MET data Portal.

Type Column Measurement Range Unit Description
Date YEAR 2022–2024 - -
MONTH 1, 2, 3, …, 12 - -
DAY 1, 2, 3, …, 31 - -
HOUR 1, 2, 3, …, 23 - -
MINUTE 1, 2, 3, …, 59 - -
DAY_OF_THE_WEEK Mon, Tue, Wed, …, Sun - -
Weather temp_outdoor Continuous Outdoor temperature
temp_dew_point Continuous Dew point temperature
temp_ground Continuous Ground temperature
humidity Continuous % Humidity
rainfall Continuous mm Rainfall amount
snowfall Continuous cm The amount of snowfall
snowfall_3hr Continuous cm

The amount of snowfall

in the past three hours

wind_speed Continuous m/s Wind speed
wind_direction 0, 1, 2, …, 16

Compass directions

spanning 16 points

Wind direction
pressure_vapor Continuous hPa Vapor atmospheric pressure
pressure_area Continuous hPa Observatory atmospheric pressure
pressure_sea Continuous hPa Sea level atmospheric pressure
sunshine Continuous

MJ

(Mega Joule)

Amount of solar insolation
solar_radiation Continuous h Incidence of sunlight
cloud_total 0, 1, 2, …, 10 - Cloud coverage
cloud_midlow 0, 1, 2, …, 10 - Mid-low layer cloud coverage
visual_range Continuous 10 m Visibility range
Residence HOUSE_ID_BUILDING 0, 1, 2 - Building in a residential complex
HOUSE_ID_FLOOR 0, 1, 2, …, 24 - Building floor index
HOUSEHOLD_ID 0, 1, 2, …348 - Unique identifier for each household

Energy

Usage

usage_ACCU_h Continuous Electricity: kWh Total energy consumption
usage_INST_h Continuous Electricity: kWh Real-time energy consumption

Method of clustering each household according to the pattern of power energy consumption in apartment complexes in the empirical area

The K-Means clustering technique32 was utilized to cluster the data for each household based on its power consumption pattern within the empirical apartment complex; the algorithm determines the number of clusters according to the hyperparameter K, and updates the cluster centroids to minimize the within-cluster sum of squared Euclidean distances, as expressed by

graphic file with name d33e753.gif 3

where Inline graphic denotes the set of households assigned to cluster Inline graphic, Inline graphic its size, x the consumption-pattern feature vector, and Inline graphic the centroid of cluster Inline graphic. Because K-Means relies on Euclidean distances and is therefore scale-sensitive, the feature vectors used for clustering (consumption-pattern vectors constructed within each candidate temporal resolution) were standardized to zero mean and unit variance using a StandardScaler (scikit-learn library), so that distances reflect pattern differences rather than measurement scales; the transform is

graphic file with name d33e792.gif 4

Clustering was performed on consumption-pattern features only to avoid cross-domain scale confounding, while temporal and meteorological variables were reserved for the subsequent supervised prediction stage. To ascertain the optimal KKK (number of clusters), the Elbow-method33, Silhouette Score34, Calinski–Harabasz Index (CHI)35, and Dunn Index36 were employed and compared across a specified range Inline graphic at each of five temporal resolutions (10-min, 1-h, 1-d, 1-w, 1-m); the Elbow analysis identified the first sharp drop in inertia (e.g., inertia falling below Inline graphic of the initial value by Inline graphic), while the three clustering quality indices were used jointly to maximize separation and cohesion in a manner that mitigates the subjectivity of a single criterion. In addition, clustering stability was examined by performing 10 independent runs per (interval, K) setting and computing the coefficient of variation of cluster sizes, and settings were retained only if the cluster-size coefficient of variation was Inline graphic and no tiny or volatile clusters appeared, after which the optimal conditions were established. Given that Euclidean K-Means is not shift-invariant at sub-daily horizons and can under-represent load-shape or phase differences, the choice of resampling interval was treated as part of the model design and assessed empirically by the multi-metric validity and stability procedure above; weekly/monthly aggregation yielded higher validity scores and stable partitions and thus was retained for downstream modeling, with the acknowledged limitation that sub-daily phase information is attenuated at coarser resolutions.

Clustering was executed under the optimal conditions identified through the envisaged process, and the power energy prediction performance of five ML models was assessed based on the clustering outcomes; the predictive accuracy of each model was evaluated using MAE, MSE, RMSE, and R2, through which the clustering conditions demonstrating the best performance were finally established.

Performance evaluation of ML models for prediction on electric power consumption in empirical apartment complexes

ML models for electric power consumption prediction

The power energy consumption and weather data used in this study were preprocessed, subsequently classified under variables such as Date, Weather, and House, and employed as input variables (Table 1), and the instantaneous consumption across clusters and apartment complexes housing multiple families was identified as the target data so that forecasting is one-step-ahead at the same temporal resolution as the input aggregation (10-min → t + 10 min; 1-h → t + 1 h; 1-d → t + 1 day; 1-w → t + 1 week; 1-m → t + 1 month), the datasets were then partitioned chronologically in a 7:3 ratio into Training and Testing sets (no shuffling, the test window being the most recent segment), and the prediction outcomes on the held-out test window, which was excluded from model training, were utilized as performance indicators. The models employed in the predictive analysis included CatBoost(iterations = 500, max_ctr_complexity = 6, random_seed = 10, od_type=’Iter’, od_wait = 25, verbose = 1000, depth = 5, learning_rate = 0.03), Decision Tree (max_depth = 8), LightGBM (n_estimators = 10000, learning_rate = 0.01, verbose = 0), Random Forest (max_depth = 8, min_samples_leaf = 8, min_samples_split = 8, n_estimators = 200), and XGBoost (n_estimators = 1000), and all algorithms were tuned on the training window via rolling-origin (forward-chaining) time-series cross-validation with 10 splits, with boosting methods employing early stopping on the validation block so that the effective number of boosting rounds was governed by validation loss rather than nominal maxima; final models were refit on the full training window and evaluated once on the held-out chronological test window. For ensemble integration, the highest-performing predictor is assigned to each cluster (C0, C1) from the tuned candidate set, and the apartment-complex total at time t is obtained by deterministic additive aggregation of synchronized cluster predictions, Inline graphic, a choice that follows the accounting identity that complex-level demand equals the sum of segment demands and does not introduce any second-stage meta-learner (no stacking, blending, bagging, or weighted voting).

Prediction on electric power consumption

To demonstrate the validity of the proposed methodology, three methods for predicting power energy consumption at the empirical site were implemented. Initially, the electric energy consumption of the apartment complex was predicted without the use of the clustering technique, serving as a control case to examine whether there is an enhancement in performance with the proposed methodology. Subsequently, the predictive performance of each model by cluster was assessed through clustering fitness evaluation. For this comparison, outcomes were measured against three error metrics (MAE, MSE, and RMSE). Ultimately, the electric power consumption of the apartment complex was forecasted using an ensemble model that amalgamated the ML predictive models for each cluster. The predictive results of the Ensemble Model were appraised using four metrics (MAE, MSE, RMSE, and R2 Score), and these outcomes were benchmarked against those of the Control Group.

Performance metrics and statistical analysis

In this study, a clustering-based ML ensemble model was proposed to enhance the prediction accuracy of electric power consumption models. To identify the optimal clustering conditions, various factors including the number of clusters and data collection intervals were defined. Performance was evaluated on the held-out chronological test window at each temporal resolution using four metrics (MAE, MSE, RMSE, R2), with symbols defined as follows: Inline graphic is the observed instantaneous power (kW) at test index Inline graphic, Inline graphic the corresponding prediction, n the number of test samples, and Inline graphic the sample mean of Inline graphic.

graphic file with name d33e903.gif 5
graphic file with name d33e909.gif 6
graphic file with name d33e915.gif 7
graphic file with name d33e921.gif 8

Analysis of variance (ANOVA) with a post-hoc test was employed in this process. Additionally, the power energy usage predicted by the proposed ensemble model was compared with that of the control group using the independent t-test. Probability values of less than 0.05 or 0.01 were used to statistically analyze the differences in prediction performances. Statistical analyses were conducted using SPSS 15.0 software (SPSS Inc., Chicago, IL, USA). The hardware and software environments used for accessing the database, preprocessing data, and developing and evaluating the model included a workstation with a CPU: 13th Gen Intel® Core™ i9-13900KS, GPU: NVIDIA Geforce RTX 4090, RAM: Samsung DDR5 32-bit*4 (128GB), and a computing environment based on OS: Windows 10, Python version 3.10.9, and Tensorflow version: 2.10.0.

Results

Data collection and processing

To demonstrate the effectiveness of the methodology in this study, electric power consumption data from 348 household apartment complexes in the metropolitan area of Korea, along with regional weather data, were collected. A total of 33,837,156 rows of energy consumption data and 99,170 rows categorized as instantaneous use data by households were collected. Weather data corresponding to the same collection period for power energy use were recorded. The hourly collected weather data comprised 17 input variables and totaled 16,529 datasets. Outliers in the dataset were identified, with 3,799 instances of negative instantaneous use values across all households deemed invalid measurements and converted to missing values, then removed. In the interpolation process, two methods were implemented to convert missing values into valid ones. The first method involved replacing missing values with the average energy consumption of all households for each respective time, processing a total of 3,051,882 missing values across all households. The second method, linear interpolation, was applied to the remaining 1,372 cases, covering cases where energy usage values across all household data were missing. Figure 2 presents box plots of the processed instantaneous-consumption data at the five resolutions (markers denote mean, median, Q1, Q3), showing a systematic reduction in dispersion as the aggregation window widens—evidence consistent with the use of weekly/monthly windows for stable clustering. Figure 3 shows line plots of the complex-level mean instantaneous consumption at the same resolutions: fine scales reveal diurnal variability and short-lived fluctuations, whereas coarse aggregation suppresses high-frequency noise and makes seasonal structure apparent.

Fig. 2.

Fig. 2

Box plots of processed instantaneous consumption by time resolution (10-min, 1-h, 1-d, 1-w, 1-m).

Fig. 3.

Fig. 3

Complex-level mean instantaneous consumption at five resolutions (10-min → 1-m); fine scales capture diurnal variability and transients, while coarse scales highlight seasonal structure.

Clustering based on electric power consumption patterns in empirical apartment complexes

In this step, the clustering results of the K-Means Algorithm32 were quantitatively evaluated using the four methods previously mentioned (Elbow-Method, and Comparison on Silhouette Score, CHI, Dunn Index). Figure 4 shows the visualization of Inertia, Silhouette Score, CHI, and Dunn Index for each time interval of the dataset, in the range of Inline graphic.

Fig. 4.

Fig. 4

Variation in clustering coefficients across different intervals ((a) Inertia, (b) Silhouette Score, (c) CHI, and (d) Dunn Index). Each graph depicts how the clustering coefficients vary with an increasing number of clusters, segmented into five intervals. The horizontal axis denotes K, while the vertical axis shows the values of the clustering coefficients.

Determination of optimal clustering conditions based on clustering validity assessment

In the Elbow-Method, the first point where inertia decreases by more than 60% as K increases or before the gain diminishes due to increased inertia was identified as the interval in which the optimal K was selected37,38. When this interval is 10 min, clustering is most effective at Inline graphic and Inline graphic (Fig. 4a). Inertia is measured in 1 M (Million) units and is likewise abbreviated on the graphs for simplicity. A higher Silhouette Score indicates a superior clustering outcome34. When the data collection interval was 10 min or 1 day, the clustering coefficient peaked at Inline graphic, while at other intervals (1 h, 1 week, 1 month), the highest clustering fitness was observed at Inline graphic (Fig. 4b). Similarly, the CHI also identifies K, which indicates a high value, as the optimal K37,38. For CHI, optimal clustering was achieved when Inline graphic for all time interval conditions (Fig. 4c). Regarding the Dunn Index shown in Fig. 4d, K representing high values is also favored as in the two metrics (Silhouette Score, CHI) mentioned earlier37,38.

Because Euclidean K-Means is not shift-invariant, sub-daily phase shifts in load shape can blur cluster boundaries at fine resolutions. To address this, five candidate intervals (10-min, 1-h, 1-d, 1-w, 1-m) were compared using the four validity indices together with a 10-run stability screen based on the coefficient of variation (CV) of cluster sizes. Weekly/monthly aggregation produced higher validity scores and more stable partitions (CVInline graphic, no tiny clusters), and was therefore retained for downstream modeling.

When the clustering was repeated, two clusters under the Inline graphic condition were divided into C0 (small cluster) and C1 (large cluster), and three clusters under the Inline graphic condition were divided into C0 (small cluster), C1 (medium cluster), and C2 (large cluster). As a follow-up, the coefficient of variation (CV) was calculated based on cluster size for each cluster under each condition. CV, a measure of data volatility, indicates higher group volatility with larger values. Values exceeding 0.3 suggest problems with the data or instability in its distribution39. The study determined clustering conditions based on CV when the standard deviation was half of the mean (0.5). As shown in Table 2, when the data time interval was 10 min, 1 h, and 1 day, the CV for C0’s cluster size was 0.5 or greater, indicating unstable variation. Conversely, with data intervals of 1 week and 1 month, the CV was measured at less than 0.5, indicating uniformly formed clusters. Thus, clustering was deemed stable for intervals of 1 week and 1 month, as well as in cases represented by Inline graphic and Inline graphic. Based on these findings, four clustering conditions were selected for model analysis (Inline graphic: interval of 1 week, Inline graphic, Inline graphic: interval of 1 week, Inline graphic, Inline graphic: interval of 1 month, Inline graphic, Inline graphic: interval of 1 month, and Inline graphic).

Table 2.

Assessments of clustering validity through repeated clustering under specific conditions (5-Intervals, Inline graphic and Inline graphic).

Interval K Simulation Results
(Inline graphic)
Mean SD CV
1 2 3 4 5 6 7 8 9 10

10

Minute

2 140 1 139 140 122 120 1 120 120 1 90.400 59.065 0.689
208 347 209 208 226 228 347 228 228 347 257.600 59.065 0.242
3 31 29 1 1 27 1 1 42 17 28 17.800 14.845 0.879
154 135 1 138 126 121 138 95 86 157 115.100 43.746 0.401
163 184 346 209 195 226 209 211 245 163 215.100 50.039 0.245

1

Hour

2 1 124 136 1 124 124 124 124 137 124 101.900 50.682 0.524
347 224 212 347 224 224 224 224 211 224 246.100 50.682 0.217
3 7 91 1 45 29 33 85 1 47 38 37.700 29.920 0.837
132 91 132 146 158 157 111 123 124 134 130.800 19.374 0.156
209 166 215 157 161 158 152 224 177 176 179.500 25.256 0.148

1

Day

2 141 1 141 1 1 111 141 111 141 1 79.000 64.622 0.862
207 347 207 347 347 237 207 237 207 347 269.000 64.622 0.253
3 87 87 5 63 45 62 4 45 67 1 46.600 31.331 0.709
94 95 144 112 148 112 134 144 109 143 123.500 20.220 0.173
167 166 199 173 155 174 210 159 172 204 177.900 18.365 0.109

1

Week

2 142 138 142 138 138 138 138 138 142 142 139.600 1.960 0.015
206 210 206 210 210 210 210 210 206 206 208.400 1.960 0.010
3 83 63 63 76 83 49 83 64 83 83 73.000 11.688 0.169
93 122 112 104 93 136 93 112 93 96 105.400 14.158 0.142
172 163 173 168 172 163 172 172 172 169 169.600 3.611 0.022

1

Month

2 142 142 139 142 142 139 142 139 142 139 140.800 1.470 0.011
206 206 209 206 206 209 206 209 206 209 207.200 1.470 0.008
3 80 63 52 52 80 80 80 80 80 53 70.000 12.594 0.190
95 120 127 127 95 95 95 95 95 125 106.900 14.686 0.145
173 165 169 169 173 173 173 173 173 170 171.100 2.625 0.016

SD: Standard Deviation, CV: Coefficient of Variation.

The effects of temporal aggregation on cluster geometry are visualized in Fig. 5 via two-dimensional PCA projections of standardized consumption-pattern features (Inline graphic and Inline graphic. Separation is visibly tighter and more coherent at 1-week and 1-month, consistent with the quantitative validity/stability results and the final choice of weekly/monthly settings for forecasting (Fig. 6).

Fig. 5.

Fig. 5

Comparison of Electricity Consumption Clustering by Temporal Resolution and Cluster Count (Inline graphic).

Fig. 6.

Fig. 6

Comparison of Electricity Consumption Clustering by Temporal Resolution and Cluster Count (Inline graphic).

Prediction on electric power consumption in empirical apartment complexes by each ML model

Evaluation on forecasting performance for electric power consumption without clustering algorithm (control groups)

The performance of each ML model was evaluated by the four conditions (Inline graphic, Inline graphic, Inline graphic, Inline graphic) established in the clustering stage. Following 10 iterations of prediction and evaluation, the performance differences among the control groups of each model were statistically compared (Table 4). The metrics selected to evaluate model performance included MAE, MSE, RMSE, and R2 Score. All algorithms (Decision Tree, Random Forest, CatBoost, LightGBM, XGBoost) were tuned by grid search with a unified rolling-origin (forward-chaining) time-series cross-validation of 10 splits on the training window; for boosting models, early stopping on the validation split determined the effective number of boosting rounds (the reported maxima served as upper bounds). The same tuning protocol was applied to the non-clustered baseline and to every per-cluster model, after which final configurations were refit on the full training window and evaluated once on the held-out chronological test window. Hyperparameter search ranges and selected settings are summarized in Table 3.

Table 4.

Performance evaluation of the electric power consumption prediction model for an apartment without clustering (Control Group).

Model Metric
MAE MSE RMSE R2 Score
CatBoost 8.107 ± 0.675 15.285 ± 3.136 10.495 ± 1.379 0.914 ± 0.036

Decision

Tree

11.479 ± 2.970 26.473 ± 16.962 15.684 ± 4.331 0.822 ± 0.047
LightGBM 8.016 ± 1.113 14.087 ± 7.071 10.394 ± 2.111 0.926 ± 0.011

Random

Forest

9.650 ± 1.907 15.195 ± 7.174 12.063 ± 2.535 0.885 ± 0.028
XGBoost 7.835 ± 1.025 15.289 ± 8.300 11.058 ± 2.525 0.920 ± 0.013
ANOVA Inline graphic Inline graphic Inline graphic Inline graphic

Inline graphic Test

Inline graphic

Inline graphic

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Table 3.

Summary of model parameters and grid search ranges used in prediction Experiments.

Models Fixed Parameters Range for Grid Search (cvInline graphic)
CatBoost

- iterations = 500

- learning_rate = 0.03

- max_ctr_complexity = 6

- random_seed = 10

- depth = 8

- od_type=’Iter’

- od_wait = 25

- verbose = 1000

- iteration: {100, 500, 1000, 1500, 2000}

- learning_rate: {0.01, 0.03, 0.05, 0.07, 0.09}

- random_seed: {2, 4, 6, 8, 10}

- depth: {2, 4, 6, 8, 10}

- l2_leaf_reg: {1, 3, 5, 7, 9}

Decision Tree

- criterion= “gini”

- splitter= “best”

- max_depth = 8

- min_samples_split = 5

- max_depth: {2, 4, 6, 8, 10}

- min_samples_split: {5, 10, 20, 40}

LightGBM

- n_estimators = 10,000

- learning_rate = 0.01

- verbose = 0

- n_estimators: {100, 500, 1000, 5000, 10000}

- learning_rate: {0.01, 0.03, 0.05, 0.07, 0.09}

Random Forest

- n_estimators = 200

- max_depth = 8

- min_samples_leaf = 8

- min_samples_split = 8

- n_estimators: {100, 200, 500}

- max_depth: {6, 8, 10, None}

- min_samples_leaf: {1, 2, 4, 6, 8, 10}

- min_samples_split: {1, 2, 4, 6, 8, 10}

XGBoost

- n_estimators = 1000

- learning_rate = 0.01

- max_depth = 6

- colsample_bytree = 1.0

- n_estimators: {100, 500, 1000, 5000}

- learning_rate: {0.01, 0.05, 0.1}

- colsample_bytree: {0.6, 0.8, 1.0}

Based on the results in Table 4, there was no significant difference in performance among the four models (CatBoost, LightGBM, Random Forest, and XGBoost), except for the Decision Tree, which showed lower outcomes as measured by MSE and RMSE. However, when considering the MAE and R2 Score, the Gradient Boost models (CatBoost, LightGBM, and XGBoost) exhibited the highest performance, while the Random Forest and Decision Tree models demonstrated relatively poor performance. Consequently, the Decision Tree and Random Forest models were excluded from the control group used to compare the performance of the Ensemble Model.

Performance evaluation on electric power consumption prediction model by each clustering condition

In Tables 5, 6, 7 and 8, the model performance was evaluated under each of the clustering conditions previously selected, and these results were compared to those of the control group (Inline graphic) assessed in Sect. "Data collection and processing" (Table 5: MAE; Table 6: MSE; Table 7: RMSE; Table 8: R2 Score).

Table 5.

Comparison of the MAE across models implemented under each clustering condition.

Model Clustering Conditions Inline graphic Test
Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
CatBoost 8.107 ± 0.675 3.100 ± 0.014 4.336 ± 0.390 4.356 ± 0.029 4.344 ± 0.377 Inline graphic Inline graphic

Decision

Tree

11.479 ± 2.970 5.126 ± 0.033 7.171 ± 0.046 7.196 ± 0.030 5.161 ± 0.076 Inline graphic Inline graphic
LightGBM 8.016 ± 1.113 3.202 ± 0.006 4.481 ± 0.002 4.353 ± 0.325 3.187 ± 0.043 Inline graphic Inline graphic

Random

Forest

9.650 ± 1.907 4.249 ± 0.026 5.937 ± 0.013 5.936 ± 0.012 4.235 ± 0.081 Inline graphic Inline graphic
XGBoost 7.835 ± 1.025 3.596 ± 0.007 5.100 ± 0.003 5.107 ± 0.010 3.618 ± 0.016 Inline graphic Inline graphic

Inline graphicTest

Inline graphic Inline graphic

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Table 6.

Comparison of mean squared error for models implemented under each clustering condition.

Model Clustering Conditions Inline graphicTest
Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
CatBoost 15.285 ± 3.136 4.521 ± 0.092 2.397 ± 0.050 4.446 ± 0.270 4.460 ± 0.251 Inline graphic Inline graphic

Decision

Tree

26.473 ± 16.962 12.547 ± 0.063 7.461 ± 0.211 12.374 ± 0.173 7.593 ± 0.233 Inline graphic Inline graphic
LightGBM 14.087 ± 7.071 5.164 ± 1.016 3.016 ± 0.037 5.484 ± 0.018 3.069 ± 0.161 Inline graphic Inline graphic

Random

Forest

15.195 ± 7.174 8.914 ± 0.039 5.059 ± 0.133 8.902 ± 0.045 5.222 ± 0.172 Inline graphic Inline graphic
XGBoost 15.289 ± 8.300 6.584 ± 0.139 3.580 ± 0.088 6.544 ± 0.118 3.684 ± 0.063 Inline graphic Inline graphic

Inline graphicTest

Inline graphic Inline graphic

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Table 7.

Comparison of RMSE for models implemented under various clustering conditions.

Model Cluster Conditions Inline graphicTest
Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
CatBoost 10.495 ± 1.379 5.976 ± 0.040 4.237 ± 0.025 5.949 ± 0.426 5.960 ± 0.407 Inline graphic Inline graphic

Decision

Tree

15.684 ± 4.331 9.917 ± 0.019 7.078 ± 0.035 9.881 ± 0.052 7.109 ± 0.070 Inline graphic Inline graphic
LightGBM 10.394 ± 2.111 6.050 ± 0.466 4.445 ± 0.013 6.227 ± 0.005 4.429 ± 0.058 Inline graphic Inline graphic

Random

Forest

12.063 ± 2.535 8.107 ± 0.012 5.789 ± 0.027 8.105 ± 0.017 5.778 ± 0.096 Inline graphic Inline graphic
XGBoost 11.058 ± 2.525 7.042 ± 0.038 4.957 ± 0.016 7.029 ± 0.029 4.988 ± 0.019 Inline graphic Inline graphic

Inline graphicTest

Inline graphic Inline graphic

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Table 8.

Comparison of R2 scores for models implemented under various clustering conditions.

Model Cluster Conditions Inline graphicTest
Inline graphic
Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
CatBoost 0.914 ± 0.036 0.899 ± 0.001 0.917 ± 0.003 0.896 ± 0.015 0.913 ± 0.013 Inline graphic Inline graphic

Decision

Tree

0.822 ± 0.047 0.771 ± 0.001 0.796 ± 0.005 0.773 ± 0.003 0.780 ± 0.005 Inline graphic Inline graphic
LightGBM 0.926 ± 0.011 0.911 ± 0.015 0.918 ± 0.006 0.908 ± 0.000 0.909 ± 0.023 Inline graphic Inline graphic

Random

Forest

0.885 ± 0.028 0.845 ± 0.000 0.862 ± 0.005 0.845 ± 0.001 0.844 ± 0.013 Inline graphic Inline graphic
XGBoost 0.920 ± 0.013 0.884 ± 0.002 0.892 ± 0.002 0.885 ± 0.001 0.865 ± 0.002 Inline graphic Inline graphic

Inline graphicTest

Inline graphic Inline graphic

Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic

As indicated in Table 5, it was found that the performance of all models under each cluster condition was superior to that of the control group in terms of MAE. Specifically, in the case of Inline graphic, the lowest MAE was observed in all instances (Table 5). Among them, CatBoost, LightGBM, and XGBoost outperformed other machine learning models with no significant difference noted among them.

As demonstrated in the MSE outcomes of Table 6, the Model by Each Cluster condition specified under K2M achieved the best performance across all models relative to the Control Group. Previously mentioned, the GBM model group (CatBoost, LightGBM, and XGBoost) outperformed the DT model group. Accordingly, CatBoost, LightGBM, and XGBoost recorded higher performance than the other two models as in the case of MAE.

Similar to the previous metrics, the RMSE of the Model by Each Cluster evidenced improved performance over the Control Group in all cases (Table 7). Analogous to other metrics, RMSE of K2M measured the lowest compared to other cases. Additionally, GBM models (CatBoost, LightGBM, and XGBoost) exhibited the best performance across all error metrics.

The R2 Score across all clustering conditions revealed a trend differing from the three Error Metrics (MAE, MSE, and RMSE) discussed earlier (Table 8). The performance of the model by each cluster, based on the R2 Score, did not surpass that of the Control Group (K0). Under the K2M clustering condition, specific models (CatBoost, Decision Tree, LightGBM) showed no significant difference from K0, while others (Random Forest, XGBoost) displayed lower performance.

Above these comparisons, all experimental groups (modeled by each cluster) demonstrated improved performance relative to the control group for the selected Error Metrics (MAE, MSE, RMSE) and exhibited similar performance for the R2 Score. The optimal clustering condition was K2M where data were collected monthly under 2 clusters condition.

Performance evaluation on electric power consumption prediction by ensemble model

An Ensemble model was constructed by integrating the highest-performing predictors from each cluster under the selected condition K2M, and its performance was compared with that of the non-clustered Control Group evaluated on the same set of 348 households. To form candidate ensembles, the per-cluster models were chosen among CatBoost (CB), LightGBM (LGBM), and XGBoost after hyperparameter tuning; deterministic additive aggregation was then applied so that the complex-level load at time t equals the sum of synchronized cluster forecasts, without a second-stage meta-learner. In the comparative analysis, Decision Tree and Random Forest in the Control Group exhibited inferior performance relative to the gradient-boosting models and were excluded from subsequent Control-versus-Ensemble comparisons based on statistical tests (Table 4).

All models operate as one-step-ahead forecasters at the same temporal resolution as the input aggregation (10-min → t + 10 min; 1-h → t + 1 h; 1-d → t + 1 day; 1-w → t + 1 week; 1-m → t + 1 month). A day-ahead (24-hour) profile is visualized by iterating the one-step-ahead hourly model across the next 24 h within the held-out test window in Fig. 7. Forecasts are generated by recursively applying the one-step-ahead hourly model across 24 steps under the chronological. All models reproduce the daily shape (morning ramp, midday dip, evening peak), but the ensembles (e.g., CB–LGBM, LGBM–LGBM) track the observed peak more closely and reduce under/over-shoot around the late-afternoon ramp, indicating better peak-load capture.

Fig. 7.

Fig. 7

Performance Evaluation of CB-Based and LGBM-Based Models for 24-Hour Electricity Consumption.

Figure 8 presents the outcomes of the t-test that compares the performances (MAE, MSE, RMSE, and R2 Score) of the Control Group and the Ensemble Models. Across the four metrics, the Ensemble configurations showed clear gains over the Control Group. In particular, the CB–CB, CB–LGBM, LGBM–CB, and LGBM–LGBM combinations formed the final ensemble set, each improving MAE, MSE, and RMSE and also raising R² at the complex level. For the K2M setting, the CB–LGBM family achieved the strongest overall performance, with complex-level R² Inline graphic, while maintaining lower absolute-error measures than the non-clustered baseline (see Tables 5, 6, 7 and 8 for MAE/MSE/RMSE/R² comparisons under clustered vs. K0 conditions).

Fig. 8.

Fig. 8

Comparison of metrics between control group and ensemble models (a) MAE, (b) MSE, (c) RMSE, and (d) R² Score.

Figure 8 presents the outcomes of the t-test that compares the performances (MAE, MSE, RMSE, and R2 Score) of the Control Group and the Ensemble Models. In each chart, the vertical axis represents the average and variance of metrics from the four ML model performances (CatBoost, LightGBM, Random Forest, and XGBoost) within the control group, while the horizontal axis lists each model from both the Ensemble Models and Control Group in ascending order based on the average metrics. Figure 8 illustrates the performance disparities between each model in the Control Group and the Ensemble Model, using symbols to denote significance levels. The symbol ‘#’ denotes non-significant differences, whereas ‘*’ indicates statistically significant differences within the scope of Inline graphic. The statistical comparison reveals that the Ensemble Model significantly enhances performance across four metrics when compared to the ML model of the control group (Fig. 8).

Discussion

Accurate apartment-level electricity demand prediction is hindered by heterogeneous household behavior, pronounced climate-driven variability, and data imperfections that can destabilize downstream clustering and learning. This study presents a practical methodology for analyzing and forecasting consumption in a metropolitan Korean apartment complex by integrating the K-Means clustering algorithm32 with a machine-learning ensemble, grouping households with similar usage profiles and training an optimized model for each cluster. Analyses use two-year time-series of smart-metered power and meteorological covariates. In the metropolitan area of Korea, characterized by a continental climate, seasonal variability and diurnal changes are pronounced40, and these climatic characteristics directly shape load patterns. A data-quality audit identified 9.023% of records as outliers or missing. To restore completeness while preserving tractability, same-time average imputation across households was applied first, followed by linear interpolation for residual gaps. This sequence limits instability from long gaps—where simple interpolation performs poorly—and enables consistent feature extraction for clustering and model training. Mean substitution can attenuate extremes, particularly under non-random missingness; however, the operational objective here prioritizes reductions in absolute prediction error for tariffing and system operations. The pre-processed data were reconstructed as instantaneous power series and evaluated at several temporal resolutions. Integrating K-Means with machine-learning forecasters—training an optimized model for each cluster and aggregating predictions—improved error metrics while maintaining ease of implementation and offering a deployable pathway for utilities.

K-Means was adopted to classify household consumption patterns because of its ease of implementation and computational efficiency on large datasets4143. To mitigate arbitrariness, the optimal K was selected using a multi-metric protocol—Elbow, Silhouette, Calinski–Harabasz (CHI), and Dunn—applied across five time resolutions3336. The Elbow analysis indicated a sharp inertia reduction to Inline graphic at Inline graphic, after which gains stabilized, defining the candidate range33. Robustness was then checked by ten replications per setting and cluster-size variability (CV); conditions showing instability (e.g., very small clusters) were screened out, yielding four practical settings (K2W, K2M, K3W, K3M) for downstream use. Two design choices increased actionability. First, as K-Means is scale-sensitive, input vectors for clustering were standardized within each candidate resolution, ensuring distance computations reflect pattern differences rather than unit scales. Second, clustering was performed on consumption patterns rather than mixed weather–load features to keep segments interpretable and inclusive; meteorological variables were retained for the prediction stage (not for defining segments), aligning the segmentation with tariff design and DR targeting while avoiding weather-driven confounding in the cluster geometry. These results contrast with prior clustering–forecasting studies that chose larger K or single-criterion rules and then applied uniform predictors post-clustering26,27. Here, validated, low-complexity segmentation (notably Inline graphic at monthly aggregation) delivered forecasting gains without sacrificing interpretability; importantly, model outputs were integrated by summing cluster-level forecasts to reconstruct complex-level load, rather than by stacking/blending, which simplifies deployment in utility workflows.

The electric energy usage prediction performance under each clustering condition was evaluated against a non-clustered control (K0) tuned by grid search with 10-fold cross-validation, which served as the baseline for assessing gains from the proposed methodology. Basic models (Decision Tree, Random Forest) and gradient-boosting models (CatBoost, LightGBM, XGBoost) were trained for the Control Group, the Model by Each Cluster, and the Ensemble Model, confirming adaptability across algorithm families. According to Table 4, there was no material difference among the three boosting models in MSE/RMSE, whereas MAE and R² revealed clear underperformance of Decision Tree and Random Forest. This outcome reflects the limits of single-tree learners in capturing complex targets and the superior capacity of gradient boosting, which builds on tree bases but controls overfitting and variance44,45. In particular, CatBoost’s overfitting-prevention mechanisms46 and LightGBM’s Gradient-Based One-Side Sampling (GOSS)47 were effective for the high-dimensional feature set (~ 370 variables) combining weather and household usage, while XGBoost—despite fast parallel training—showed slightly lower generalization than CatBoost/LightGBM48. In Sect. "Performance evaluation on electric power consumption prediction model by each clustering condition", cluster-specific models (K2W, K2M, K3W, K3M) improved MAE, MSE, RMSE relative to K0 across all settings; the largest gains occurred under K2M (MAE 55.552%, MSE 80.091%, RMSE 59.272%), indicating effective grouping of households with similar usage patterns. By contrast, R² for K2M matched K0 for CatBoost/Decision Tree/LightGBM and was slightly lower for Random Forest/XGBoost (Table 5). This divergence is attributed to cluster homogenization: frequent moderate errors are reduced (improving absolute-error metrics) while the system-level variance structure changes little, limiting R² gains. Complex-level forecasts were obtained by summing cluster-level predictions (no stacking, blending, or bagging), a choice that simplifies deployment and preserves the observed error reductions for apartment-level operations.

In heterogeneous customer portfolios, cluster-wise learning with deterministic additive aggregation (C0 + C1 → complex total) offers a general template for high-fidelity yet auditable forecasting. In the present framework, the best predictor is assigned to each cluster (C0, C1) and the resulting forecasts are combined transparently at the total-load level; this preserves interpretability at deployment and aligns with the accounting identity of demand aggregation. Under the K2M condition, the cluster-aware ensemble attained R² Inline graphic (10-fold) at the complex level, indicating that most variance in apartment-level load is recovered once cluster structure is learned and predictions are aggregated. On the same unitless metric, reported single-model baselines in related domains are lower: Abumohsen et al.17 (utility dataset) R² Inline graphic with RF; Yin et al.18 (equipment energy) R² Inline graphic with LightGBM; Zhang et al.19 (short-term load) R² Inline graphic with an XGBoost-AOA hybrid, underscoring task-granularity effects. The CatBoost + LightGBM pairing is preferred on both technical and empirical grounds. CatBoost reduces target leakage via ordered boosting and handles missing/categorical signals robustly, curbing overfitting in rich feature spaces; LightGBM improves sample/feature efficiency through GOSS/EFB, lowering variance without sacrificing bias. These complementary inductive biases are well-matched to the study’s high-dimensional predictors (apartment-level usage plus meteorological covariates), yielding more stable generalization than alternatives that substitute a different tree booster under the same inputs. Importantly, the ensemble’s R² gain relative to cluster-only models is not merely a numerical improvement: it reflects a two-stage effect, (i) within-cluster homogenization that suppresses frequent moderate errors and (ii) aggregation-level variance reconstruction that restores system-level structure lost when clusters are evaluated in isolation.

However, the dataset used in this investigation was acquired from an apartment complex in the metropolitan area of Korea, representing a specific housing type; external validity is therefore bounded to environments with similar characteristics. In addition, variables related to date, day of the week, and weather were major input features; detailed building-physics and social/occupancy attributes (e.g., spatial volume, wall thickness, number of windows, household size and changes due to move-ins/outs) were unavailable. Data completeness was restored via same-time average imputation followed by short-gap linear interpolation, which can attenuate extremes at imputed timestamps. Future work could expand generalizability by considering variations in power-usage patterns across regions (latitude-driven climate differences) and other residential types (single-family homes, multi-family dwellings, row houses, officetels). Acquiring richer building and occupancy data—subject to privacy and access constraints—offers a path to further accuracy gains. Despite these constraints, the proposed methodology demonstrated high predictive accuracy in specific environments, providing a foundation for effective application where similar characteristics hold. Conclusively, the clustering-based ensemble framework maps naturally to utility practice—supporting utility load forecasting and smart-grid optimization through segmentable tariff design, targeted demand-response, and PV/ESS scheduling—while remaining deployable without additional data collection or complex second-stage modeling.

Conclusion

This study proposes an efficient power-demand prediction framework that clusters households by energy-use patterns and integrates the optimal model for each cluster into a cohesive ensemble, delivering accurate, deployment-ready forecasts without additional data collection or complex meta-learning; in the metropolitan-Korea demonstration, the approach reduced errors by 55.6% (MAE), 80.1% (MSE), and 59.3% (RMSE) versus the non-clustered baseline and achieved R² Inline graphic at the complex level. The contribution is to combine quantitative cluster validity with cluster-aware model assignment and transparent (additive) aggregation that reconstructs building-level demand, thereby capturing seasonal and daily fluctuations in continental-climate regions and outperforming traditional predictors that ignore intermediate clustering. In practical terms, the framework provides a usable pathway for smart grids and energy management systems: utilities can employ it for peak-load forecasting, demand-side management, and tariff design, building managers can target HVAC scheduling and efficiency actions by segment, and policymakers can design evidence-based demand-response and incentive programs that promote sustainability, cost reduction, and grid resilience.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (13.1MB, csv)

Acknowledgements

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) under the metaverse support program to nurture the best talents (IITP-2025-RS-2023-00254529) grant funded by the Korea government(MSIT).

Author contributions

Taeyong Sim and Sanghyun Ryu conceived the study; designed the experiments; collected and preprocessed the data; performed data analysis and predictive modeling; coordinated adjustments based on experimental results; collated the final dataset; and drafted and critically revised the manuscript. Dongjun Lee provided critical feedback through in-depth review of the manuscript; drafted and critically revised the manuscript. Sujin Lee and Changjae Chun provided critical feedback through in-depth review of the manuscript. Hyeonjoon Moon reviewed the revised manuscript and gave final approval for submission to Scientific Reports.

Data availability

The electric usage data that support the findings of this study are available from Korea Institute of Energy Research(KIER) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Korea Institute of Energy Research(KIER). The weather datasets generated and/or analysed during the current study are available in the Korea Meteorological Administration (KMA) Automated Synoptic Observing System(ASOS) repository, https://data.kma.go.kr.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Taeyong Sim and Sanghyun Ryu contributed equally to this work.

References

  • 1.Cetina, Q., Roscoe, R. A. J. & Wright, P. S. Challenges for smart electricity meters due to dynamic power quality conditions of the grid: A review. In 2017 IEEE International Workshop on Applied Measurements for Power Systems (AMPS) 1–6 (IEEE, 2017)
  • 2.Soares, A., Gomes, Á. & Antunes, C. H. Categorization of residential electricity consumption as a basis for the assessment of the impacts of demand response actions. Renew. Sustain. Energy Rev.30, 490–503 (2014). [Google Scholar]
  • 3.Keles, D. & Yilmaz, H. Ü. Decarbonisation through coal phase-out in Germany and Europe—Impact on Emissions, electricity prices and power production. Energy Policy. 141, 111472 (2020). [Google Scholar]
  • 4.Andersen, A. D. & Gulbrandsen, M. The innovation and industry dynamics of technology phase-out in sustainability transitions: insights from diversifying petroleum technology suppliers in Norway. Energy Res. Social Sci.64, 101447 (2020). [Google Scholar]
  • 5.Zheng, G., Li, K. & Wang, Y. The effects of high-temperature weather on human sleep quality and appetite. Int. J. Environ. Res. Public Health. 16 (2), 270 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Darçın, M. Association between air quality and quality of life. Environ. Sci. Pollut. Res.21 (3), 1954–1959 (2014). [DOI] [PubMed] [Google Scholar]
  • 7.Balbus, J. et al. Introduction: Climate Change and Human Health. The Impacts of Climate Change on Human Health in the United States: A Scientific Assessment. 25–42 ( U.S. Global Change Research Program, 2016). 10.7930/J0VX0DFW
  • 8.Sarofim, M. C. et al. Temperature-Related Death and Illness. The Impacts of Climate Change on Human Health in the United States: A Scientific Assessment 43–68 (U.S. Global Change Research Program, 2016). 10.7930/J0MG7MDX
  • 9.Ebi, K. L. et al. Hot weather and heat extremes: health risks. Lancet398 (10301), 698–708 (2021). [DOI] [PubMed] [Google Scholar]
  • 10.Anvari-Moghaddam, A., Guerrero, J. M., Vasquez, J. C., Monsef, H. & Rahimi‐Kian, A. Efficient energy management for a grid‐tied residential microgrid. IET Gener. Transm. Distrib.11(11), 2752–2761 (2017). [Google Scholar]
  • 11.Moore, F. Environmental Control Systems: Heating, Cooling, Lighting (1993).
  • 12.Moon, J., Park, S., Rho, S. & Hwang, E. Robust Building energy consumption forecasting using an online learning approach with R ranger. J. Building Eng. Volume. 47, 2352–7102. 10.1016/j.jobe.2021.103851 (2022). [Google Scholar]
  • 13.Korea Research Institute for Human Settlements. Korea Housing Survey, 2021 (Accessed 31st July 2023) (2021).
  • 14.Wen, L., Zhou, K. & Yang, S. Load demand forecasting of residential buildings using a deep learning model. Electr. Power Syst. Res.179, 0378–7796. 10.1016/j.epsr.2019.106073 (2020). [Google Scholar]
  • 15.Mel Keytingan, M., Shapi, N. A., Ramli, Lilik, J. & Awalin,. Energy consumption prediction by using machine learning for smart building: Case study in Malaysia. Dev. Built Environ.5, 100037. 10.1016/j.dibe.2020.100037 (2021). [Google Scholar]
  • 16.Nivethitha Somu, Gauthama Raman, M. R. & Krithi Ramamritham A hybrid model for Building energy consumption forecasting using long short term memory networks. Appl. Energy. 261, 0306–2619. 10.1016/j.apenergy.2019.114131 (2020). [Google Scholar]
  • 17.Abumohsen, M., Owda, A. Y. & Owda, M. Electrical load forecasting based on random forest, xgboost, and linear regression algorithms. In 2023 International Conference on Information Technology(ICIT) 25–31 (IEEE, 2023)
  • 18.Yin, Z. et al. Pump feature construction and electrical energy consumption prediction based on feature engineering and LightGBM algorithm. Sustainability15 (1), 789 (2023). [Google Scholar]
  • 19.Zhang, L. & Jánošík, D. Enhanced short-term load forecasting with hybrid machine learning models: catboost and XGBoost approaches. Expert Syst. Appl.241, 122686 (2024). [Google Scholar]
  • 20.Kim, D., Yim, T. & Lee, J. Y. Analytical study on changes in domestic hot water use caused by COVID-19 pandemic. Energy231, 120915 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Mardani, A., Liao, H., Nilashi, M., Alrasheedi, M. & Cavallaro, F. A multi-stage method to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine learning techniques. J. Clean. Prod.275, 122942 (2020). [Google Scholar]
  • 22.Morteza, A. et al. Deep learning hyperparameter optimization: application to electricity and heat demand prediction for buildings. Energy Build.289, 113036 (2023). [Google Scholar]
  • 23.González-Vidal, A., Mendoza-Bernal, J., Niu, S., Skarmeta, A. F. & Song, H. A transfer learning framework for predictive energy-related scenarios in smart buildings. IEEE Trans. Ind. Appl.59 (1), 26–37 (2022). [Google Scholar]
  • 24.Jordan, M. I. & Mitchell, T. M. Machine learning: Trends, perspectives, and prospects. Science349 (6245), 255–260 (2015). [DOI] [PubMed] [Google Scholar]
  • 25.Kraus, M., Feuerriegel, S. & Oztekin, A. Deep learning in business analytics and operations research: Models, applications and managerial implications. Eur. J. Oper. Res.281 (3), 628–641 (2020). [Google Scholar]
  • 26.Liu, H., Liu, Y., Huang, H., Wu, H. & Huang, Y. Energy consumption dynamic prediction for HVAC systems based on feature clustering deconstruction and model training adaptation. Build. Simul. 17, 1439–1460 (2024).
  • 27.Han, F., Pu, T., Li, M. & Taylor, G. Short-term forecasting of individual residential load based on deep learning and K-means clustering. CSEE J. Power Energy Syst.7 (2), 261–269 (2020). [Google Scholar]
  • 28.Li, K., Zhang, J., Chen, X. & Xue, W. Building’s hourly electrical load prediction based on data clustering and ensemble learning strategy. Energy Build.261, 111943 (2022). [Google Scholar]
  • 29.Culaba, A. B., Rosario, D., Ubando, A. J. R., Chang, J. S. & A. T., & Machine learning-based energy consumption clustering and forecasting for mixed‐use buildings. Int. J. Energy Res.44 (12), 9659–9673 (2020). [Google Scholar]
  • 30.Zhao, Q., Xu, M. & Fränti, P. Sum-of-squares based cluster validity index and significance analysis. In International conference on adaptive and natural computing algorithms 313–322 (Springer Berlin Heidelberg, 2009)
  • 31.Open MET Data Portal. Meteorological data, Retrieved December 1, 2024. from (2015). https://data.kma.go.kr
  • 32.MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics 5, 281–298 (University of California press, 1967)
  • 33.Thorndike, R. L. Who belongs in the family? Psychometrika18 (4), 267–276 (1953). [Google Scholar]
  • 34.Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math.20, 53–65 (1987). [Google Scholar]
  • 35.Caliński, T. & Harabasz, J. A dendrite method for cluster analysis. Commun. Statistics-theory Methods. 3 (1), 1–27 (1974). [Google Scholar]
  • 36.Dunn, J. C. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters (1973).
  • 37.Bholowalia, P. & Kumar, A. EBK-means: A clustering technique based on elbow method and k-means in WSN. Int. J. Comput. Appl.105 (9), 17–24 (2014).
  • 38.Kosowski, P., Kosowska, K. & Janiga, D. Primary energy consumption patterns in selected European countries from 1990 to 2021: a cluster analysis approach. Energies16 (19), 6941 (2023). [Google Scholar]
  • 39.Brown, C. E. Applied Multivariate Statistics in Geohydrology and Related Sciences 155–157 (Springer, Berlin, Heidelberg, 1998). [Google Scholar]
  • 40.Kottek, M., Grieser, J., Beck, C., Rudolf, B. & Rubel, F. World map of the Köppen-Geiger climate classification updated (2006).
  • 41.Pham, D. T., Dimov, S. S. & Nguyen, C. D. Selection of K in K-means clustering. Proc. Inst. Mech. Eng. Part C219(1), 103–119 (2005). [Google Scholar]
  • 42.Bock, H. H. Clustering methods: A history of k-means algorithms. In Selected Contributions in Data Analysis and Classification (eds Brito, P. et al.) 161–172 (Springer, Berlin, Heidelberg, 2007).
  • 43.Wu, J. Advances in K-means clustering: a data mining thinking (Springer Science & Business Media, 2012). [Google Scholar]
  • 44.Bentéjac, C., Csörgő, A. & Martínez-Muñoz, G. A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev.54, 1937–1967 (2021). [Google Scholar]
  • 45.Dev, V. A. & Eden, M. R. Gradient boosted decision trees for lithology classification. In Proceedings of the 9th International Conference on Foundations of Computer-Aided Process Design (FOCAPD 2019) (eds Garcia Muñoz, S. et al.), Vol. 47, 113–118 (Elsevier, 2019).
  • 46.Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst.31, 6638–6648 (2018).
  • 47.Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst.30, 3146–3154 (2017).
  • 48.Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining 785–794 (2016)

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (13.1MB, csv)

Data Availability Statement

The electric usage data that support the findings of this study are available from Korea Institute of Energy Research(KIER) but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Korea Institute of Energy Research(KIER). The weather datasets generated and/or analysed during the current study are available in the Korea Meteorological Administration (KMA) Automated Synoptic Observing System(ASOS) repository, https://data.kma.go.kr.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES