Skip to main content
MethodsX logoLink to MethodsX
. 2025 Jan 3;14:103152. doi: 10.1016/j.mex.2024.103152

Machine learning and Fuzzy logic fusion approach for osteoporosis risk prediction

Rabia Khushal 1,, Dr Ubaida Fatima 1
PMCID: PMC11764049  PMID: 39866197

Abstract

The metabolic disorder osteoporosis has affected a humongous number of individuals globally. Its progression can be slowed down by modifying lifestyle risk factors and by following appropriate treatment. In this research work, modifiable risk factors of osteoporosis have been considered. All these variables are binary thus providing incomplete information. Machine learning implementation on these factors took a large computation time and has shown poor accuracy. Thus fuzzy concept has been introduced leading to the development of a fusion of machine learning and fuzzy logic approach. Three binary variables of the considered dataset have been compared thus fuzzy input is produced which also considers the uncertainty of these binary variables and since three input variables are transformed into one the number of features has also been reduced leading to optimization of computation time and accuracy. Moreover, it guides the individual to modify lifestyle factors to slow down the disease progression or reduce the risk of osteoporosis. The proposed model is validated on the diabetes risk prediction dataset.

  • The study examines modifiable binary risk factors for osteoporosis, such as diet, smoking, and exercise etc.

  • A fusion of machine learning and fuzzy logic is introduced to improve accuracy and reduce computation time.

  • The model, which condenses three binary inputs into one, is validated using a diabetes risk prediction dataset.

Keywords: Machine learning, Fuzzy logic, Osteoporosis, Diabetes

Method name: Fuzzy data transformation technique using three variables, fuzzy machine learning logic model for crisp output

Graphical abstract

Image, graphical abstract


Specifications table

Subject area: Computer Science
More specific subject area: Fuzzy logic, Machine learning
Name of your method: Fuzzy data transformation technique using three variables, fuzzy machine learning logic model for crisp output
Name and reference of original method: Fuzzy data transformation technique using two variables, fuzzy machine learning logic model for fuzzy output (Rabia Khushal, 2024)
Resource availability: Data will be available on request

Background

Osteoporosis is a metabolic disorder of the skeletal system that has affected presently 200 patients globally. According to WHO criteria, osteoporosis can be defined as low bone mineral density, with a T score ≤ 2.5 found in the spine, femur's neck, or while doing a full hip examination [1]. This bone disorder causes noticeable alterations in the biological composition of the bone, subsequently disrupting the bone's structure [2]. This aging disorder increases the risk of fragility fractures which oftentimes lead to hospitalization, mobility, and also loss of independence [3]. (age, gender, menopause, ethnicity, etc.) and non-modifiable risk factors(alcohol, smoking, deficiency of vitamin D and calcium, sedentary lifestyle, etc.) [1-4]. By modification of the above-stated risk factors, the risk of osteoporosis can be minimized moreover if the patient suffers from osteoporosis by modifying the mentioned risk factors along with proper treatment the progression of osteoporosis can be significantly slowed down [1].

Machine learning and its related subfield have gained popularity worldwide due to its ability to analyzation and learn from big data [5]. In K-nearest neighbor (KNN) the new case is categorized into the most similar category to the existing cases based on its assumed resemblance to the available examples and data. Decision Tree (DT) is tree-structured, with internal nodes standing in for dataset attributes, branches for decision rules, and leaf nodes for each outcome. Random forest (RF) is founded on the idea of ensemble learning, which is the act of merging several classifiers to enhance the model's performance and solve a challenging issue. The support vector machine (SVM) method seeks to identify the optimal line or decision boundary that can divide n-dimensional space into classes. This optimal decision boundary is referred to as a hyperplane. Gradient boosting (GB) is an ensemble method that executes the conversion of weak learners into strong ones [6]. Fuzzy logic conceptualizations have been introduced by Zadeh to understand the uncertain side of real-world issues. The brief description of the fuzzy logic system is firstly the crisp input is fuzzified and now contains both certain and uncertain sides of the variable. Following fuzzification the fuzzy input is fed into the inference engine where by utilizing the rules stored in the rules repository the fuzzy output is produced which is defuzzified afterwards and produces a crisp input [7].

The related work related to the research work is, in [8] a fusion of machine learning and fuzzy logic developed to find out the chances of the development of polycystic ovary syndrome (PCOS). Different metrics and computation time have been computed. The results were found to be satisfactory. Likewise in [9] and [10] fuzzy logic and machine learning are used for the prediction of healthy lifestyle and diabetes risk prediction respectively. In [11] a predictive machine learning model for diabetes risk assessment is developed employing health indicators and lifestyle factors. By inclusion of health indicators along with lifestyle factors the accuracy of the discussed model has been improved significantly. Likewise in [12] a predictive model based on machine learning techniques is developed for early detection of osteoporosis risk in individuals. For the prediction of specifications of the bone scaffold, a fuzzy modeling system has been employed [13] to model the input effects for instance weight percentage, elastic modulus, porosity, and Poisson ratio on the output variable that is a compressive strength. A machine learning model is developed in [14] for the prediction of osteoporosis in rheumatoid arthritis patients. Different metrics have been computed. A novel global clustering coefficient-dependent degree centrality metric is introduced in [15] that is applied to different datasets. The discussed methodology has shown improved results. A machine learning and explainable artificial intelligence-based decision support system has been developed [16] for the prediction of osteoporosis risk. The deployment of explainable artificial intelligence tools provided the interpretability and rationale behind classifier prediction.

Method details

Data description

The dataset considered in this research work is the osteoporosis risk prediction dataset that is extracted from the Kaggle website [17] and is visualized in Fig 1. The dataset consists of 14 input features. The type of input features are categorical and numerical. The output feature is the binary variable. Information has been collected from 1958 patients.

Fig. 1.

Fig 1

Considered dataset description visualized on Python 3.8.8.

Data preprocessing and feature selection

Data preprocessing is the crucial step to make the data in the appropriate format for the training of the model. The categorical variables of the dataset have been encoded. For instance the gender variable consists of two values male and female. These values must be encoded before the implementation of machine learning techniques. Hence after encoding male is shown as 0 and the female is shown as 1. Likewise, all the categorical variables have been encoded. In addition to this null values have been replaced with the median value. Moreover, the columns for instance “id” have been removed. The dataset is imbalanced it can be balanced using different techniques however, balancing techniques can lead to overfitting and introduce noise [18]. For the considered osteoporosis risk prediction dataset, the feature selection step has been executed using the literature. Modifiable risk factors have been considered for the proposed model which are calcium intake, vitamin D intake, body weight, physical activity, smoking, and alcohol consumption [1-4]. These considered risk determinants are equivalently important for osteoporosis risk prediction.

Machine learning implementation on the considered dataset

The considered dataset is divided into training data and testing data. The usual division is considered that 80 % of the data will be used for training the model and 20 % data will be used for the testing of the model. The employed machine learning techniques that have been briefly described in the introduction section which are KNN, GB, DT, RF, and SVC have been implemented on the selected modifiable risk factors. These machine-learning techniques have been used as classifiers. The accuracy and computation time of machine learning techniques have been calculated. Accuracy can be defined as the ratio of accurate predictions to all input samples [6].

Fuzzy concept implementation on the considered dataset

The selected six modifiable binary features extracted from the considered dataset are used for the prediction of osteoporosis risk in individuals. These factors are equally imperative for the prediction of osteoporosis risk however machine learning implementation on these features shows poor accuracy and takes more computation time. By introducing a fuzzy concept the above discussed problem is minimized. The modification of binary variables has been executed according to Eq. (1).

yn={0iffxn=0ANDx(n+1)=0ANDx(n+2)=01iffxn=1ANDx(n+1)=1ANDx(n+2)=10.5iffxnx(n+1)ANDx(n+1)x(n+2)ANDx(n+2)xn (1)

The yn shows the modified output variable whereas xn shows the binary input available. The xn and x(n + 1) and x(n + 2) are compared if both have the same values, the same values have been assigned to yn however when any variable has a different value, 0.5 value is assigned to the yn. The values 0,1 and 0.5 are the fuzzy membership degrees that have been assigned to the fuzzy input variable yn. Membership degree 1 shows that every factor is present,0 shows the absence of every factor, and 0.5 shows that at least one factor is missing. The collective impact of variables has been analyzed by following this technique. Consider body weight as x1, calcium intake as x2, and vitamin D intake as x3, these variables have been compared and their output variable is called risk determinant 1 (y1(x1,x2,x3)) as represented in Table 1. Likewise consider regular physical activity as x4, smoking as x5 and alcohol consumption as x6.

Table 1.

First five values of crisp transformation of binary variables into risk determinant 1.

Body weight
(x1)
Calcium intake
(x2)
Vitamin D intake
(x3)
Risk determinant 1
(y1)
1 1 1 1
1 1 1 1
0 0 1 0.5
1 0 0 0.5
0 1 1 0.5

Taking negation of smoking (x5) and alcohol consumption (x6) for comparison because 0 shows a good habit of not smoking and consuming alcohol similarly 1 shows a bad habit of smoking and consuming alcohol. Comparing x4, ¬x5, and ¬x6 values where in ¬x5 0 shows smoking and 1 shows no smoking likewise and in ¬x6 0 shows alcohol consumption and 1 shows no alcohol consumption. Their output is called risk determinant 2 (y2(x4, ¬x5, ¬x6)) as represented in Table 2.

Table 2.

First five values of crisp transformation of binary variables into risk determinant 2.

Physical activity
(x4)
No smoking
(¬x5)
No alcohol consumption (¬x6) Risk determinant 2
(y2)
1 0 1 0.5
1 1 0 0.5
0 1 1 0.5
1 0 0 0.5
0 0 0 0

The comparison criteria for the risk determinants 1 and 2 is if all three binary input factors are present, 1 is assigned to the output variable, if all are absent 0 is assigned to the output variable and if any one factor is different 0.5 is assigned to the output variable. In this way, crisp binary input variables have been fuzzified. The output osteoporosis risk prediction can be found as represented in Table 3 if risk determinant 1 is present and risk determinant 2 is present, the individual doesn't have a risk of osteoporosis(0) if risk determinant 1 is absent and risk determinant 2 is absent, the individual has a risk of osteoporosis(1),), if risk determinant 1 is partially absent and risk determinant 2 is partially absent, the individual maybe has a risk of osteoporosis(0.75) and, if risk determinant 1 is present and risk determinant 2 is partially present, the individual may do not has a risk of osteoporosis(0.25).

Table 3.

First five values of fuzzy output and crisp output of fuzzy modified dataset.

Risk determinant 1
(y1)
Risk determinant 2
(y2)
Osteoporosis risk
(Fuzzy output)
Osteoporosis risk
(Crisp output)
1 0.5 0.25 0
1 0.5 0.25 0
0.5 0.5 0.75 1
0.5 0.5 0.75 1
0.5 0 0.75 1

Implementation of fuzzy logic in machine learning approach

The healthcare dataset of osteoporosis risk prediction has been transformed by a fuzzy logic approach. The considered dataset is transformed (fuzzified) by comparison of three binary variables with each other. Subsequently, to the transformation, the dataset goes through the inference engine where inference is executed by the employment of machine learning techniques which are KNN, GB, DT, RF, and SVC. According to the rule stored in the rule repository as represented in Table 4 the osteoporosis risk has been predicted in a fuzzy format which is defuzzified into a crisp output variable. The whole process is visualized in Fig. 2. The inference in the fuzzy inference system is executed using machine learning techniques hence in this way the integration of machine learning and fuzzy logic is executed.

Table 4.

Depiction of rules for osteoporosis risk prediction.

Rule # Rules
1 If risk determinant 1 is present and risk determinant 2 is present then the individual does not have a risk of osteoporosis.
2 If risk determinant 1 is absent and risk determinant 2 is absent then the individual has a risk of osteoporosis.
3 If risk determinant 1 is partially absent and risk determinant 2 is partially absent then the individual may have a risk of osteoporosis.
4 If risk determinant 1 is present and risk determinant 2 is absent then the individual may have a risk of osteoporosis.
5 If risk determinant 1 is present and risk determinant 2 is partially present then the individual may not have a risk of osteoporosis.

Fig. 2.

Fig 2

Machine learning and fuzzy fusion approach for real-life datasets.

Evaluation of method

The machine learning techniques have been implemented on the normal osteoporosis risk prediction dataset afterward these are applied to the fuzzy transformed osteoporosis risk prediction dataset which produces an output in a fuzzy format that is defuzzified into a crisp output. The accuracy and computation time of the models have been found as represented in Table 5. All crucial features have been considered therefore accuracy values are poor. By incorporating fuzzy concepts, the fuzzification of binary features transformed a single variable into one. Thus the accuracy of all machine learning techniques has been improved significantly and computation time has been significantly reduced. Therefore leads to the verdict that the proposed methodology is effective and can be applied to the real-life dataset for the increase in accuracy, optimization of the computation time, and consideration of all crucial features.

Table 5.

Accuracies and computation time of considered osteoporosis risk prediction dataset.

Machine learning techniques Normal dataset
Fuzzy transformed dataset
Accuracy Computation time Accuracy Computation time
KNN 0.49 0.27s 0.99 0.27s
GB 0.53 1.06s 0.99 0.71s
DT 0.49 0.14 0.99 0.09
RF 0.53 0.88 0.99 0.75
SVC 0.52 0.76 0.99 0.29

Method validation

The dataset considered for validation is the diabetes risk prediction dataset [19]. The dataset is extracted from the UCI machine learning repository. All variables are binary variables. Features have been selected by utilizing a correlation matrix. The selected features are polyuria (x1), polydipsia(x2), sudden weight loss(x3), genital thrush(x4), irritability(x5), and partial paresis(x6). The first three variables have transformed into risk determinant 1 (y1) and the remaining three are transformed into risk determinant 2 (y2). Risk determinants 1 and 2 are used for diabetes risk prediction according to the literature and results are found to be satisfactory as represented in Table 6.

Table 6.

Accuracies and computation time of validation dataset.

Machine learning techniques Normal dataset
Fuzzy transformed dataset
Accuracy Computation time Accuracy Computation time
KNN 0.89 1.05s 0.99 0.5s
GB 0.95 1.7s 0.99 1.1s
DT 0.90 1.6s 0.99 0.6s
RF 0.91 1.3 0.99 0.8s
SVC 0.93 0.7s 0.99 0.4s

Conclusion

Osteoporosis, a bone disorder risk can be minimized by modifying the modifiable risk factors. A fusion of machine learning and fuzzy logic is applied to the osteoporosis risk prediction dataset. The binary input variable has been modified by comparing three variables with each other. As a result, the three variables have been transformed into a single variable thus leading to an increase in the accuracy of the model, and since the number of features has been reduced the computation time of the model is also minimized. Moreover by following a fuzzy concept the uncertain side of the binary input variable is also considered. Machine learning techniques have been implemented on the selected features of the considered dataset and machine learning techniques have been applied to the modified fuzzy considered dataset. The machine learning techniques applied on normal datasets show poor accuracy and take large computation time however machine learning applied on fuzzy datasets shows excellent accuracy moreover the computation time is also optimized. The proposed fusion methodology has also been applied to the diabetes risk prediction dataset and results are found to be satisfactory thus concluding that the proposed fusion approach can be applied to other healthcare datasets moreover it can be applied to datasets of other domains.

Limitations

The limitations of the method would be, that domain knowledge is necessary for designing fuzzy rules. For instance, for the prediction of certain diseases, an individual should know the symptoms and causes of that particular disease. Another limitation would be not all variables can be compared. Only those variables are compared which are related to each other to some extent.

Ethics statements

Not Applicable.

CRediT author statement

Rabia Khushal: Writing – original draft, Visualization, Validation, Data curation. Ubaida Fatima: Writing – review & editing, Methodology, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Footnotes

Related research article: None.

For a published article: Rabia Khushal, Ubaida Fatima,Fuzzy machine learning logic utilization on hormonal imbalance dataset, Computers in Biology and Medicine,174,2024, https://doi.org/10.1016/j.compbiomed.2024.108429.

Data availability

Data will be made available on request.

References

  • 1.Tański W., Kosiorowska J., Szymańska-Chabowska A. Osteoporosis – risk factors, pharmaceutical and non-pharmaceutical treatment. Eur. Rev. Med. Pharmacol. Sci. 2021;25(9) doi: 10.26355/eurrev_202105_25838. [DOI] [PubMed] [Google Scholar]
  • 2.Pouresmaeili F., Kamalidehghan B., Kamarehei M., Goh Y.M. A comprehensive overview on osteoporosis and its risk factors. Ther. Clin. Risk Manag. 2018 doi: 10.2147/TCRM.S138000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Wilson-Barnes S.L., Lanham-New S.A., Lambert H. Modifiable risk factors for bone health & fragility fractures. Best Pract. Res. Clin. Rheumatol. 2022;36(3) doi: 10.1016/j.berh.2022.101758. [DOI] [PubMed] [Google Scholar]
  • 4."Osteoporosis," [Online]. Available: https://www.osteoporosis.foundation/health-professionals/about-osteoporosis. [Accessed June 2024].
  • 5.Yalug B.B. Chapter 19 - Prospect of Data Science and Artificial Intelligence For Patient-Specific Neuroprostheses. Somatosensory Feedback for Neuroprosthetics; 2021. Dilek Betul Arslan, Esin Ozturk-Isik. [Google Scholar]
  • 6."Machine learning techniques," [Online]. Available: https://www.javatpoint.com/machine-learning-techniques. [Accessed June 2024].
  • 7.Surucu O., Gadsden S.A., Yawney J. Condition monitoring using machine learning: a review of theory, applications, and recent advances. Expert Syst. Appl. 2023;221 [Google Scholar]
  • 8.Khushal R., Fatima U. Fuzzy machine learning logic utilization on hormonal imbalance dataset. Comput. Biol. Med. 2024;174 doi: 10.1016/j.compbiomed.2024.108429. [DOI] [PubMed] [Google Scholar]
  • 9.Khushal R., Fatima U. Fuzzy computing in healthcare. 2024 International Visualization, Informatics and Technology Conference (IVIT); Kuala Lumpur, Malaysia; 2024. [Google Scholar]
  • 10.Ahmed U., Issa G.F., Aftab S., Khan M.F. Prediction of diabetes empowered with fused machine learning. IEEE Access. 2022;10:8529–8538. [Google Scholar]
  • 11.Kausalya Nandan T.P., Ramesh G., Saicharan Reddy G., ShivaRam G., Anusha M., Shafi S. Predicting diabetes with integrated health-lifestyle fusion. 1st International Conference on Trends in Engineering Systems and Technologies (ICTEST); Kochi; 2024. [Google Scholar]
  • 12.Tu J.-B., Liao W.-J., Liu W.-C., Gao X.-H. Using machine learning techniques to predict the risk of osteoporosis based on nationwide chronic disease data. Sci. Rep. 2024;14(1) doi: 10.1038/s41598-024-56114-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Moshayedi A.J., Taheri M., Heidari A., Alreda B.A., Yuan Y., Heidarshenas B., Toghraie D. Fuzzy modeling and characterization of mechanical and biological properties of a selective laser melting shape: a comprehensive study. Opt. Laser Technol. 2024;170 [Google Scholar]
  • 14.Lee C., Joo G., Shin S., Im H., Moon K.W. Prediction of osteoporosis in patients with rheumatoid arthritis using machine learning. Sci. Rep. 2023;13(1) doi: 10.1038/s41598-023-48842-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Fatima U., Hina S., Wasif M. A novel global clustering coefficient-dependent degree centrality (GCCDC) metric for large network analysis using real-world datasets. J. Comput. Sci. 2023;70 [Google Scholar]
  • 16.Khanna V.V., Chadaga K., Sampathila N., Chadaga R., Prabhu S., Swathi K.S., Jagdale A.S., Bhat D. A decision support system for osteoporosis risk prediction using machine learning and explainable artificial intelligence. Heliyon. 2023;9(12) doi: 10.1016/j.heliyon.2023.e22456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.kulkarni A. Kaggle; 2024. Osteoporosis Risk Prediction.https://www.kaggle.com/datasets/amitvkulkarni/lifestyle-factors-influencing-osteoporosis [Online]. Available. [Google Scholar]
  • 18.Buribayev Z., Yerkos A., Shaikalamova S., Imanbek R., Zhetpisbay Z. Improving medical diagnosis with a hybrid balancing technique. J. Prob. Comput. Sci. Inform. Technolog. 2024 [Google Scholar]
  • 19.early_stage_diabetes_risk_prediction, UCI Machine learning repository., 2020.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data will be made available on request.


Articles from MethodsX are provided here courtesy of Elsevier

RESOURCES