Abstract
Air pollution poses a significant challenge to public health and the global environment. The Industrial Revolution, advancing technology and society, led to elevated air pollution levels, contributing to acid rain, smog, ozone depletion, and global warming. Poor air quality increases risks of respiratory inflammation, tuberculosis, asthma, chronic obstructive pulmonary disease (COPD), pneumoconiosis, and lung cancer.
In this context, developing reliable air pollution forecasting models is imperative for guiding effective mitigation strategies and policy interventions. This study presents a daily air pollution prediction model focusing on Jakarta's sulfur dioxide (SO₂) and carbon monoxide (CO) levels, leveraging a hybrid methodology that integrates Clustering Large Applications (CLARA) with the Fuzzy Time Series Markov Chain (FTSMC) approach.
The analysis revealed five distinct clusters, with medoid selection refined iteratively to ensure stabilization. A 5 × 5 Markov transition probability matrix was subsequently constructed for modeling the data. Predicted values for SO₂ and CO in Jakarta using the CLARA-FTSMC hybrid method showed strong alignment with the actual data. Forecasting accuracy results for SO₂ and CO in Jakarta, based on Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), showed excellent performance, underscoring the efficacy of the CLARA-FTSMC hybrid approach in predicting air pollution levels.
-
•
The CLARA-FTSMC hybrid method demonstrates high effectiveness in analyzing large datasets, addressing the limitations of previous hybrid clustering fuzzy time series methods.
-
•
The number of fuzzy time series partitions is optimally determined based on clustering results obtained through the gap statistic approach, ensuring robust partitioning.
-
•
The forecasting accuracy of the CLARA-FTSMC hybrid method, evaluated using MAE and RMSE, showed excellent performance in predicting daily air pollution levels of SO₂ and CO in Jakarta.
Keywords: Air pollution, CLARA, Clustering, Forecasting, Hybrid clustering fuzzy time series
Method name: Hybrid Clustering Large Applications and Fuzzy Time Series Markov Chain
Graphical abstract

Specifications table
| Subject area: | Mathematics and Statistics |
| More specific subject area: | Statistics; Hybrid Clustering Fuzzy Time Series; Air Pollution |
| Name of your method: | Hybrid Clustering Large Applications and Fuzzy Time Series Markov Chain |
| Name and reference of original method: | Finding Groups in Data: An Introduction to Cluster Analysis (1991), Gentle JE, Kaufman L, Rousseuw PJ, Biometrics, Vol. 47, 788 p. A fuzzy time series-Markov chain model with an application to forecast the exchange rate between the Taiwan and us Dollar (2012), Tsaur RC, Int J Innov Comput Inf Control, 8(7 B):4931–42. |
| Resource availability: | The data utilized in this study were sourced from the official website of the DKI Jakarta Environment Agency (https://lingkunganhidup.jakarta.go.id/). The dataset comprises daily air pollution standard index records spanning from January 2021 to August 31, 2024, ensuring comprehensive coverage for the analysis period. |
Background
Air pollution remains one of the most pressing challenges for public health and the global environment. Since the Industrial Revolution, significant advancements in technology, energy, and societal development have provided immense benefits to humanity. However, these advancements have also resulted in severe environmental consequences, particularly the escalation of air pollution. Air pollution contributes to numerous environmental issues, including acid rain, smog, ozone depletion, and global warming [1]. Moreover, poor air quality is strongly associated with various health risks, such as respiratory inflammation, tuberculosis, asthma, chronic obstructive pulmonary disease (COPD), pneumoconiosis, and lung cancer [2,3].
Air pollution is defined as the alteration of air composition due to the presence of hazardous substances, including particulate matter (PM), sulfur dioxide (SO₂), carbon monoxide (CO), nitrogen dioxide (NO₂), and other heavy metals. According to the World Air Quality Report (2023), Indonesia ranks first as the most polluted country in Southeast Asia [4]. As of June 23, 2024, Jakarta recorded the second-highest air pollution levels globally, with an Air Quality Index (AQI) of 160 [5]. The primary drivers of air pollution in Jakarta include industrialization, fossil fuel combustion, mining activities, and the annual 10% increase in private vehicle ownership [6,7].
Although the Jakarta government has implemented various policies, such as promoting public transportation, deploying air quality task forces, and imposing disincentives for parking fees, air quality continues to deteriorate. Therefore, developing accurate air pollution forecasting models is increasingly critical to support the formulation of effective mitigation policies.
One interesting approach in time series analysis is the use of fuzzy logic-based methods. Fuzzy logic has been proven to provide more effective results in solving various practical problems, including the forecasting of time series data [8]. Fuzzy logic allows us to incorporate uncertainty in time series data, which is often caused by variations and external factors that are difficult to consider by classical analysis methods [9].
Fuzzy time series (FTS) was introduced by Song and Chissom [10] to predict enrollments at the University of Alabama. Since then, various FTS methods have been developed such as weighted [11], Chen [12], Markov [13] and multiple attributes [14]. One of the recent methods that has received attention in fuzzy-based time series analysis is the fuzzy time series Markov chain (FTSMC). FTSMC is an approach that combines fuzzy logic concepts with chain models to forecast future values based on fuzzy partitions of time series data. The partitioning allows various states to occur in the time series, whereas the fuzzy concept allows us to measure the degree of membership of each state in each partition.
Based on the studies conducted by [15,16], FTSMC is the most preferable method based on MSE and MAPE metrics in compared to other FTS methods. However, FTS has some issues, such as determining the exact number of partitions; also, the length of the interval does not have a definite formula in its calculation [17]. The relationship between the number of partitions and the interval length has been addressed in previous research works [15,18]. In fact, the number of partitions and the length of the interval greatly affect the formation of the membership relationship (FLR), resulting in differences in the accuracy of the forecasting results.
Therefore, the selection of the optimal number of partitions is an interesting problem that needs to be discussed. Some previous studies have tried to incorporate clustering methods to optimize the partitions in the FTS method [[19], [20], [21], [22]]. However, in determining the optimal partitioning, k-means and k-medoid methods still have shortcomings, as they are less effective in analyzing large data compared to improved methods such as clustering large applications (CLARA).
The CLARA is robust to large amounts of data and can cope with outliers [20]. Thus, the incorporation of CLARA in the FTSMC analysis stage results in a very flexible method in forecasting large amounts of daily air pollution data. This research aims to develop a prediction model for sulfur dioxide (SO₂) and carbon monoxide (CO) based daily air pollution in Jakarta, using a hybrid approach of Clustering Large Applications (CLARA) and Fuzzy Time Series Markov Chain (FTSMC). The model is expected to provide more accurate projections to support strategic decision-making in urban air pollution mitigation.
Method details
Fuzzy Time Series
The Fuzzy Time Series (FTS) method typically utilizes historical data in linguistic form [21]. The FTS process consists of defining the universe of discourse, partitioninginto several intervals, fuzzification, establishing fuzzy relationships, and defuzzification.
Definition 1
Letbe the universe of discourse, whererepresents possible linguistic values within. The linguistic fuzzy variableoris defined as:
| (1) |
Whereis the membership function of fuzzy setand
Definition 2
Letbe a real-valued time series, whereis defined over the fuzzy set. Thenrepresents the fuzzy time series of.
Definition 3
Ifis caused bythen the fuzzy logical relationship (FLR) is expressed as:.
Definition 4
If an FLR originates from state, and transitions to other states, (, such asthe FLRs are grouped into a fuzzy logical relationship group (FLRG) as follows:
| (2) |
Fuzzification transforms numerical data into linguistic values, forming the FLR. This step requires determining the upper and lower bounds using the following equations:
| (3) |
Hereis the upper boundary of the-th interval, while the lower boundary of the next interval isFor the first and last clusters, where no prior or subsequent centers exist, the lower boundand upper boundare computed as:
| (4) |
| (5) |
Fuzzy time series Markov chain
The transition probability matrix in Markov chain is constructed as a) matrix, whererepresents the number of fuzzy sets [22]. The equation to determine the transition probability between states is as follows:
| (6) |
Where:
: Transition probability from stateto
: Number of transitions from stateke
: Total number of data points in state
The transition probability matrix P can be expressed as:
1) The initial forecast values are determined using the following rules:
| Rule 1 | If a fuzzy set does not have a Fuzzy Logical Relationship (FLR), (), andat time falls into, then the forecast valueis, whereis the midpoint of interval. |
| Rule 2. | If the Fuzzy Logical Relationship Group (FLRG)represents a one-to-one relationship (), andat timefalls into state, then the forecast valueis mq, where mq is the midpoint ofin the FLRG formed at time. |
| Rule 3. | If the FLRGrepresents a one-to-many relationship (,, j = 1, 2, 3, …, q), andat timefalls into state, the forecastis calculated as: |
| + | |
| where,are the midpoints of and is replaced withfor stateto improve forecasting accuracy. |
- 2) To improve the forecast accuracy, an adjustment is made by adding the difference between the actual valueand the previous value, as follows:
(8)
diff(Y(t)) is the difference between the actual valueat timeand the previous actual value.
| (9) |
where:
| : Actual data at period | |
| : Initial forecast result at period | |
| : Adjusted forecast result at period |
Euclidean distance
Euclidean distance is a method for calculating the distance between points in Euclidean space, which is subsequently used to group these points into clusters based on their proximity [25].
| (10) |
where:
: Euclidean distance betweenand: the-th cluster center value
: the-th actual data value
Gap statistics
The Gap Statistics method is used to determine the optimal number of clusters. It achieves this by comparing the intra-cluster variation of the actual data with the expected variation from randomly generated data. The Gap Statistics method demonstrates higher accuracy when integrated with the CLARA algorithm, which efficiently incorporates large datasets through sampling. The Gap Statistic value is calculated using the following equation:
| (11) |
where:
| : The gap statistic for the optimal number of clusters | |
| : The number of bootstrap samples used in the gap statistic method | |
| : The intra-cluster dispersion forcluster in the bth bootstrap sample | |
| : The within-cluster variation forcluster in the original dataset |
Clustering large applications
Clustering Large Applications (CLARA) utilizes medoids as cluster centers to group large-scale data and is robust against outliers [26]. CLARA divides large datasets into smaller subsets while ensuring optimal medoid selection. The sample size for each subset is determined using the following equation:
| (12) |
where:
| : Number of clusters. |
The fundamental principle of the partition around medoids (PAM) algorithm is to minimize the dissimilarity between objects within a cluster by iteratively swapping the medoid and non-medoid objects until convergence [27]. Typically, the process of finding a new medoid is repeated to achieve the best medoid with the smallest total distance representing the cluster. The formula for evaluating the medoid swap is:
| (13) |
Where:
| If S < 0 | the medoid swap is repeated |
| If S > 0 | the iteration stops |
Accuracy of the forecasting model
The accuracy of a forecasting model improves as the error value decreases. A lower error indicates higher accuracy and vice versa [28,29]. The formula to measure the accuracy of time series analysis results is as follows (Table 1):
| (14) |
| (15) |
Table 1.
Model forecast accuracy criteria.
| Accuracy Value | Criterion |
|---|---|
| ≤ 10 10 < Value ≤ 20 20 < Value ≤ 50 > 50 |
Excellent Good Fair Poor |
Method validation
In forecasting sulfur dioxide and carbon monoxide air pollution in Jakarta using the hybrid CLARA and FTSMC, we utilized secondary data, specifically the daily air quality index from January 2021 to August 31, 2024. The dataset includes two dependent variables. The details of these variables are provided in Table 2.
Table 2.
Variable description.
| Variable | Description |
|---|---|
| Daily sulfur dioxide air pollution data | |
| Daily carbon monoxide air pollution data |
Descriptive analytics
Table 3.
Descriptive Analysis of SO₂ and CO Pollution in Jakarta from January 1, 2021 to August 31, 2024.
| Minimum | Maximum | Mean | |
|---|---|---|---|
| SO2 | 8 | 112 | 36,27 |
| CO | 3 | 55 | 15,12 |
Selection of optimal cluster number
Gap Statistic is particularly effective in the CLARA algorithm, which works with large datasets. It determines the optimal number of clusters by comparing the clustering results with expectations derived from random data, ensuring more accurate clustering outcomes (Fig. 1).
Fig. 1.
Determination of the optimal number of clusters using Gap Statistics, with a maximum of 15 clusters and bootstrapping of 100 iterations.
Clustering large applications analysis
Based on Eq. (12), the number of samples from the actual data is determined in the process of selecting the optimal medoid (Tables 4, Table 6, Table 7, Table 8, Table 9).
Table 4.
Samples of the CLARA algorithm in selecting the initial medoids, where each sample represents a data sequence from the actual SO2 and CO air pollution datasets.
Table 6.
Distance calculation between objects and initial medoids for SO₂ and CO air pollution data.
| N | Date | SO₂ | CO | Cost1 | Cost2 | Cost3 | Cost4 | Cost5 | Proximity |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 01/01/2021 | 29 | 6 | 11,40 | 5,10 | 16,16 | 22,02 | 26,40 | 5,10 |
| 2 | 02/01/2021 | 27 | 7 | 9,22 | 5,00 | 17,72 | 23,02 | 27,46 | 5,00 |
| 3 | 03/01/2021 | 25 | 7 | 7,28 | 6,40 | 19,65 | 24,70 | 29,15 | 6,40 |
| 4 | 04/01/2021 | 24 | 4 | 7,81 | 9,22 | 21,54 | 27,20 | 31,62 | 7,81 |
| 1336 | 28/08/2024 | 14 | 24 | 15,52 | 20,62 | 32,31 | 32,25 | 36,06 | 15,52 |
| 1337 | 29/09/2024 | 14 | 26 | 17,46 | 21,93 | 33,11 | 32,56 | 36,22 | 17,46 |
| 1338 | 30/08/2024 | 15 | 28 | 19,24 | 21,67 | 33,12 | 32,02 | 35,51 | 19,24 |
| 1339 | 31/08/2024 | 14 | 25 | 16,49 | 21,26 | 32,70 | 32,39 | 36,12 | 16,49 |
| Cost Total | 8.850,34 | ||||||||
Table 7.
Samples of CLARA algorithm for new medoid selection, where each sample represents a data sequence on actual air pollution data for SO₂ and CO.
Table 8.
New medoids based on actual SO₂ and CO air pollution data samples.
| Cluster | Medoid |
Label | |
|---|---|---|---|
| SO₂ | CO | ||
| 1 | 14 | 11 | Very Low |
| 2 | 25 | 12 | Low |
| 3 | 42 | 15 | Moderate |
| 4 | 49 | 24 | High |
| 5 | 51 | 25 | Very High |
Table 9.
Calculation of object distances to new medoids for SO2 and CO air pollution data.
| N | Date | SO₂ | CO | Cost1 | Cost2 | Cost3 | Cost4 | Cost5 | Proximity |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 01/01/2021 | 29 | 6 | 15,81 | 7,21 | 15,81 | 26,91 | 29,07 | 7,21 |
| 2 | 02/01/2021 | 27 | 7 | 13,60 | 5,39 | 17,00 | 27,80 | 30,00 | 5,39 |
| 3 | 03/01/2021 | 25 | 7 | 11,70 | 5,00 | 18,79 | 29,41 | 31,62 | 5,00 |
| 4 | 04/01/2021 | 24 | 4 | 12,21 | 8,06 | 21,10 | 32,02 | 34,21 | 8,06 |
| 1336 | 28/08/2024 | 14 | 24 | 13,00 | 16,28 | 29,41 | 35,00 | 37,01 | 13,00 |
| 1337 | 29/09/2024 | 14 | 26 | 15,00 | 17,80 | 30,08 | 35,06 | 37,01 | 15,00 |
| 1338 | 30/08/2024 | 15 | 28 | 17,03 | 18,87 | 29,97 | 34,23 | 36,12 | 17,03 |
| 1339 | 31/08/2024 | 14 | 25 | 14,00 | 17,03 | 29,73 | 35,01 | 37,00 | 14,00 |
| Cost Total | 9.531,82 | ||||||||
Selection of the new medoid
Comparison of initial and new medoid total Euclidean distance
The total Euclidean distance of the initial medoid is 8.850,34, while the total Euclidean distance of the new medoid is 9.531,82 using Eq. (13), the result is:
Since, the iteration stops, and the initial medoid becomes the final medoid for the air pollution data of SO₂ and CO in Jakarta.
Medoid intervals
The medoid results from Table 5 serve as the basis for forming the discourse interval universeusing Eqs. (3)–(5) (Table 10).
Table 5.
Initial medoids based on the sample data of SO₂ and CO air pollution.
| Cluster | Medoid |
Label | |
|---|---|---|---|
| SO₂ | CO | ||
| 1 | 18 | 9 | Very Low |
| 2 | 30 | 11 | Low |
| 3 | 44 | 12 | Moderate |
| 4 | 46 | 20 | High |
| 5 | 50 | 22 | Very High |
Table 10.
Medoid intervals based on medoids in Table 5 for SO₂ and CO air pollution data.
| SO₂ |
CO |
||
|---|---|---|---|
| Interval | Midpoint | Interval | Midpoint |
| = 16,0 | = 6,5 | ||
| = 30,5 | = 10,75 | ||
| = 41,0 | = 13,75 | ||
| = 58,0 | = 18,5 | ||
| = 91,5 | = 38,0 | ||
Fuzzy time series Markov chain
Fuzzification, fuzzy logic relationship (FLR), and fuzzy logic relationship group (FLRG)
Fuzzy Logic Relationships (FLR) is a concept in FTS that is used to capture the relationship between fuzzy sets in time series shown in Table 11. While FLRG is an accumulation of FLR between fuzzy sets to help understand historical data patterns for forecasting purposes as shown in Table 12.
Table 11.
Fuzzification and FLR results for SO₂ and CO air pollution data.
| N | Date | SO₂ | Fuzzification | FLR | CO | Fuzzification | FLR |
|---|---|---|---|---|---|---|---|
| 1 | 01/01/2021 | 29 | 6 | ||||
| 2 | 02/01/2021 | 27 | 7 | ||||
| 3 | 03/01/2021 | 25 | 7 | ||||
| 4 | 04/01/2021 | 24 | 4 | ||||
| 1336 | 28/08/2024 | 14 | 24 | ||||
| 1337 | 29/09/2024 | 14 | 26 | ||||
| 1338 | 30/08/2024 | 15 | 28 | ||||
| 1339 | 31/08/2024 | 14 | 25 |
Table 12.
FLRG results for SO₂ and CO air pollution data.
| K |
FLRG |
|
|---|---|---|
| SO₂ | CO | |
| 1 | (276), (27) | (137), (42)(29), (11)(2) |
| 2 | (27), (218)(11), (5) | (53), (94)(81), (14)(2) |
| 3 | (1), (13)(217), (49) | (25), (87)(156), (79)(28) |
| 4 | (2)(52), (438)(1) | (5), (17)(74), (90)(58) |
| 5 | (1) | (4)(35), (50)(164) |
Transition probability matrix Markov
Based on the FLRG in Table 12, the next step is to form a 5 × 5 transition probability matrix based on Eq. (6). whereis the transition probability matrix for sulfur dioxide andis the transition probability matrix for carbon monoxide.
Hybrid CLARA FTSMC forecasting results for SO₂ and CO air pollution
Defuzzification is performed in two stages: initial forecasting and adjustment of the forecast. The results of the defuzzification process are shown in Tables 13 and 14.
Table 13.
Hybrid CLARA FTSMC forecasting results for SO₂ air pollution in Jakarta.
| N | Date | SO₂ | Fuzzification | Initial Forecast | Final Forecast |
|---|---|---|---|---|---|
| 1 | 01/01/2021 | 29 | |||
| 2 | 02/01/2021 | 27 | 56,45 | 49,45 | |
| 3 | 03/01/2021 | 25 | 45,53 | 57,53 | |
| 4 | 04/01/2021 | 24 | 58,89 | 48,89 | |
| 1337 | 29/09/2024 | 14 | 15,47 | 15,47 | |
| 1338 | 30/08/2024 | 15 | 15,47 | 15,47 | |
| 1339 | 31/08/2024 | 14 | 15,47 | 16,47 | |
| 1340 | 01/09/2024 | - | - | - | 16,47 |
Table 14.
Forecast results for CO air pollution in Jakarta using Hybrid CLARA FTSMC.
| N | Date | CO | Fuzzification | Initial Forecast | Final Forecast |
|---|---|---|---|---|---|
| 1 | 01/01/2021 | 6 | |||
| 2 | 02/01/2021 | 7 | 8,83 | 9,83 | |
| 3 | 03/01/2021 | 7 | 9,45 | 9,45 | |
| 4 | 04/01/2021 | 4 | 9,45 | 6,45 | |
| 1337 | 29/09/2024 | 26 | 21,29 | 23,29 | |
| 1338 | 30/08/2024 | 28 | 22,58 | 24,58 | |
| 1339 | 31/08/2024 | 25 | 23,88 | 20,88 | |
| 1340 | 01/09/2024 | - | - | - | 20,88 |
The predicted value of SO₂ using the CLARA-FTSMC hybrid method has good agreement with actual data. The predicted value for September 1 was 16,47, while the actual value reported by the DKI Jakarta Environment Agency for the same day was 12. In addition, the forecasting accuracy assessed from the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) is classified as excellent (Fig. 2).
Fig. 2.
Graph of actual data and forecasted SO₂ values using Hybrid CLARA FTSMC.
The predicted value of CO using the CLARA-FTSMC hybrid method has good agreement with actual data. The predicted value for September 1 was 20,88, while the actual value reported by the DKI Jakarta Environment Agency for the same day was 22. In addition, the forecasting accuracy assessed from the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) is classified as excellent (Fig. 3 and Table 15).
Fig. 3.
FTSMC Graph of actual data and forecasted CO values using Hybrid CLARA FTSMC.
Table 15.
Forecast model accuracy for SO₂ and CO air pollution in Jakarta using Hybrid CLARA FTSMC.
|
SO₂ |
CO |
||
|---|---|---|---|
| MAE | RMSE | MAE | RMSE |
| 1,19 | 1,63 | 3,17 | 4,66 |
Conclusion
Based on the descriptive analysis of daily air pollution data for SO₂ and CO in Jakarta (as shown in Table 2), SO₂ data from January 1, 2021 to August 31, 2024 has recorded an average value of 36,27, with a minimum value of 8 and a maximum of 112. Similarly, the CO data has showed an average value of 15,12, with a minimum value of 3 and a maximum of 55.
This research utilizes a hybrid methodology that combines Clustering Large Applications (CLARA) and Fuzzy Time Series Markov Chain (FTSMC). Statistical analysis of the gaps suggests the optimal number of clusters as five, with medoid selection completed at the initial optimal medoid point. This process has generated a 5 × 5 Markov transition probability matrix to effectively model the data.
Predicted values for SO₂ and CO using the CLARA-FTSMC hybrid method have showed strong alignment with the actual data. For September 1, the predicted value of SO₂ was 16,47, while the actual value reported by the DKI Jakarta Environment Agency was 12. Similarly, for CO, the predicted value was 20,88, as compared to the actual value of 22. In addition, the forecasting accuracy, evaluated by Mean Absolute Error (MAE) and Root Mean Square Error (RMSE), was classified as excellent.
Limitations
-
1.
Model prediction accuracy is assessed using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).
-
2.
This research focuses on a single area in Indonesia, specifically the air pollution forecasting in Jakarta, thus limiting the generalizability of the results to other areas that may have different environmental conditions.
-
3.
The forecasting approach primarily relies on dependent variables and it has not incorporated independent variables to identify factors that can influence the results. Including such variables could improve the accuracy and robustness of the predictions.
Ethics statements
The data utilized in this study were sourced from the official website of the DKI Jakarta Environment Agency (https://lingkunganhidup.jakarta.go.id/). The dataset comprises daily air pollution standard index records spanning from January 2021 to August 31, 2024.
CRediT authorship contribution statement
Nurtiti Sunusi: Conceptualization, Methodology, Software, Writing – original draft, Visualization. Ankaz As Sikib: Conceptualization, Methodology, Writing – review & editing, Validation, Supervision. Sumanta Pasari: Conceptualization, Methodology, Writing – review & editing.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Footnotes
Related research article:None
For a published article:None
Data availability
Data will be made available on request.
References
- 1.Kingsy G.R., Manimegalai R., Geetha D.M.S., Rajathi S., Usha K., Raabiathul B.N. Proceedings of the IEEE Region 10 Annual International Conference/TENCON. 2017. Air pollution analysis using enhanced K-Means clustering algorithm for real time sensor data; pp. 1945–1949. August 2006. [Google Scholar]
- 2.Simkovich S.M., Goodman D., Roa C., Crocker M.E., Gianella G.E., Kirenga B.J., et al. The health and social implications of household air pollution and respiratory diseases. NPJ Prim. Care Respir. Med. 2019;29(1):1–17. doi: 10.1038/s41533-019-0126-x. [Internet]Available from: [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Wu S., Ni Y., Li H., Pan L., Yang D., Baccarelli A.A., et al. Short-term exposure to high ambient air pollution increases airway inflammation and respiratory symptoms in chronic obstructive pulmonary disease patients in Beijing, China. Environ. Int. 2016;94:76–82. doi: 10.1016/j.envint.2016.05.004. [Internet]Available from: [DOI] [PubMed] [Google Scholar]
- 4.IQAir . IQAir; 2023. World Air Quality Report 2023; pp. 1–45.https://www.iqair.com/id/world-most-polluted-countries [Internet]Available from: [Google Scholar]
- 5.IQAir. Ranking of the most polluted big cities directly [Internet]. 2024. Available from: https://www.iqair.com/id/world-air-quality-ranking
- 6.Maung T.Z., Bishop J.E., Holt E., Turner A.M., Pfrang C. Indoor air pollution and the health of vulnerable groups: a systematic review focused on particulate matter (PM), volatile organic compounds (VOCs) and their effects on children and people with pre-existing lung disease. Int. J. Environ. Res. Public Health. 2022;19(14) doi: 10.3390/ijerph19148752. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Muttaqin M.Z., Herwangi Y., Susetyo C., Sefrus T., Subair M. Public transport performance based on the potential demand and service area (case study : Jakarta Public Transport) Daengku J. Humanit. Soc. Sci. Innov. 2021;1(1):1–7. [Google Scholar]
- 8.Zhang R., Ashuri B., Deng Y. A novel method for forecasting time series based on fuzzy logic and visibility graph. Adv. Data Anal. Classif. 2017;11(4):759–783. [Google Scholar]
- 9.Cheng S.H., Chen S.M., Jian W.S. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics SMC 2015. 2016. A novel fuzzy time series forecasting method based on fuzzy logical relationships and similarity measures; pp. 2250–2254. [Google Scholar]
- 10.Song Q., Chissom B.S. Forecasting enrollments with fuzzy time series–part I. Fuzzy Sets Syst. 1993;54(1):1–9. [Google Scholar]
- 11.Yu H.K. Weighted fuzzy time series models for TAIEX forecasting. Phys. A Stat. Mech. Appl. 2005;349(3–4):609–624. [Google Scholar]
- 12.Chen S.M. Forecasting enrollments based on fuzzy time series. Fuzzy Sets Syst. 2006;4287:324–336. LNCS. [Google Scholar]
- 13.Sullivan J., Woodall W.H. A comparison of fuzzy forecasting and Markov modeling. Fuzzy Sets Syst. 1994;64(3):279–293. [Google Scholar]
- 14.Cheng C.H., Cheng G.W., Wang J.W. Multi-attribute fuzzy time series method based on fuzzy clustering. Expert Syst. Appl. 2008;34(2):1235–1242. [Google Scholar]
- 15.Alyousifi Y., Othman M., Sokkalingam R., Faye I., Silva P.C.L. Predicting daily air pollution index based on fuzzy time series Markov chain model. Symmetry. 2020;12(2):1–18. (Basel) [Google Scholar]
- 16.Ramadani K., Devianto D. 2020. The forecasting model of bitcoin price with fuzzy time series Markov chain and Chen logical method; p. 2296. (November) [Google Scholar]
- 17.Zaenurrohman H.S., Udjiani T. Fuzzy time series Markov Chain and Fuzzy time series Chen & Hsu for forecasting. J. Phys. Conf. Ser. 2021;1943(1) [Google Scholar]
- 18.Mubarrok M.N., Nuryanto U.W., Fika R., Adi P., Tanati A.E. Fuzzy time series Markov chain for Rice production forecasting. Bp. Int. Res. Crit. Inst. J. 2022;5(3):27148–27154. Vol. [Google Scholar]
- 19.Vovan T., Fuzzy LT.A. Time series model based on improved fuzzy function and cluster analysis problem. Commun. Math. Stat. 2022;10(1):51–66. doi: 10.1007/s40304-019-00203-5. [Internet]Available from: [DOI] [Google Scholar]
- 20.Gentle J.E., Kaufman L., Rousseuw P.J. Finding groups in data: an introduction to cluster analysis. Biometrics. 1991;47:788. [Google Scholar]
- 21.Efendi R., Ismail Z., Deris M.M. A new linguistic out-sample approach of fuzzy time series for daily forecasting of Malaysian electricity load demand. Appl. Soft Comput. J. 2015;28:422–430. doi: 10.1016/j.asoc.2014.11.043. [Internet]Available from: [DOI] [Google Scholar]
- 22.Li N., Kolmanovsky I., Girard A., Filev D. Fuzzy encoded Markov chains: overview, observer theory, and applications. IEEE Trans. Syst. Man Cybern. Syst. 2021;51(1):116–130. [Google Scholar]
- 23.Dewi D.A., Surono S., Thinakaran R., Nurraihan A. Hybrid fuzzy K-medoids and cat and mouse-based optimizer for Markov Weighted fuzzy Time Series. Symmetry. 2023;15(8) (Basel) [Google Scholar]
- 24.Surono S., Goh K.W., Onn C.W., Nurraihan A., Siregar N.S., Saeid A.B., et al. Optimization of Markov weighted fuzzy time series forecasting using genetic algorithm (GA) and particle swarm optimization (PSO) Emerg. Sci. J. 2022;6(6):1375–1393. [Google Scholar]
- 25.Alguliyev R.M., Aliguliyev R.M., Sukhostat L.V. Weighted consensus clustering and its application to big data. Expert Syst. Appl. 2020;150 doi: 10.1016/j.eswa.2020.113294. [Internet]Available from: [DOI] [Google Scholar]
- 26.Gupta T., Panda S.P. Proceedings of the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing. 2019. Clustering validation of CLARA and K-means using silhouette DUNN measures on iris dataset; pp. 10–13. Trends, Prespectives Prospect Com 2019. [Google Scholar]
- 27.Arora P., Deepali V.S. Analysis of K-means and K-medoids algorithm for big data. Phys. Procedia. 2016;78(December 2015):507–512. doi: 10.1016/j.procs.2016.02.095. [Internet]Available from: [DOI] [Google Scholar]
- 28.Hodson T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not. Geosci. Model Dev. 2022;15(14):5481–5487. [Google Scholar]
- 29.Sunusi N. Bias of automatic weather parameter measurement in monsoon area, a case study in Makassar coast. 2022;10(June):1–15.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data will be made available on request.



