Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Nov 21;15:41260. doi: 10.1038/s41598-025-25100-6

Using convolutional neural networks with late fusion to predict heart disease

Deema Mohammed AlSekait 1, Mohammed Zakariah 2, Syed Umar Amin 3, Parul Dubey 4,, Zafar Iqbal Khan 3
PMCID: PMC12638869  PMID: 41271968

Abstract

Cardiovascular diseases are responsible for one-third of all deaths that occur globally. Machine learning and data mining have made it easier and quicker for physicians to diagnose or identify patients. This article presents a novel late fusion method over convolutional neural networks for predicting heart disease. The data was sourced from the UCI Machine Learning Repository for this research. The sample is composed of 303 instances and 13 features. This is known as the ‘late fusion’ deep learning technique, combining data from multiple sources to produce more accurate predictions. Thus, specialized models perform independent data analysis in this manner, and the results are aggregated to provide an extended forecast. CNNs are designed with specificity in mind to handle different kinds of data modalities. Conversely, DNNs can obtain comprehensive information and analyze tabular data. Here, we also found that our model was highly accurate in identifying heart illnesses with the help of CNNs and DNNs in an effective manner. A new hybrid architecture was designed to merge numerical features with graphics, capturing the dataset’s spatial and sequential properties. The approach we adopted in our research yielded fruitful outcomes after a careful assessment of the model. The validation and testing set had zero percent error for accuracy, precision, recall, and an F1 score of 99.99%. This work’s valuable contribution to the field of medical diagnostics provides a strong foundation for further exploration and simplifies the task of creating precise and extensible algorithms for predicting heart disease. Through this, patient health is enhanced, and medical data management is improved.

Keywords: Heart disease prediction, Machine learning, Heart disease Cleveland UCI dataset, Cardiovascular diseases, Deep neural network, Convolutional neural network

Subject terms: Cardiology, Cardiac device therapy

Introduction

One of the most serious chronic illnesses in the world is thought to be heart disease1,2. Reducing blood flow to body organs, especially the heart, is the root cause of heart disease3. Moreover, heart failure develops as a result of coronary artery constriction or blockages that decrease blood flow4. Dizziness, dyspnea, limb edema, physical weakness, chest pain, and other symptoms are some of the signs of heart disease5. Even though the leading underlying cause of heart disease is atherosclerosis or plaque accumulation in the arteries, it starts early in life. However, symptoms generally occur after a person is over 50 years of age. Cardiovascular disease (CVD) accounted for roughly 19% more deaths from 2012 to 2022, which amounts to 19 million approximately (68 references from prior analyses and reports show similar death rate maps, though also complex and limiting limitations, numbers don’t match exactly but similar fallWithinRanges). The World Health Organization (WHO) estimated that 19.70 million people worldwide died from cardiovascular disease in 2022. Studies published by the European Society of Cardiology (ESC) suggest that there are around 4.4 million patients diagnosed with heart disease every year10,11. In such a group, half of the patients died in the first one to three years. In addition to a lack of competence, many patients with CVD cannot afford effective treatment. The next subsections point out the newest scientific steps in cardiac disease diagnosis with the help of deep learning.

Furthermore, according to reference12, asynchronous federated learning (AFL) has been suggested by researchers as a possible way to boost the prediction of cardiovascular disease (CVD) by using machine learning. Facilitating distributed learning across multiple universities without the need for data exchange, ASF, or AFL protects data security and privacy. Moreover, the QMBC algorithm, as detailed by the authors in13, is a powerful machine learning tool that combines the advantages of binary classification methods with the Quine-McCluskey method. This new approach efficiently addresses feature selection, dimensionality reduction, and complexity management, achieving better prediction accuracy. The interpretability of the QMBC algorithm enables clinicians to assess the accuracy of the predictions and to use the results to inform their clinical decision-making and personalization of treatment for each individual patient. The QMBC algorithm’s use of binary classifiers enables the early detection of heart disease and the provision of personalized care. Because of this integration, the occurrence of such diseases can be predicted with greater precision and interpretability. With the introduction of a fresh perspective, the current work holds the potential to advance the field of cardiovascular disease prediction. The application of combining optimization and machine learning methods to enhance the accuracy of heart disease prediction has garnered increased scholarly interest14,15.

Consequently, decision trees16, random forests17, support vector machines18, and artificial neural networks19,20 fall under an algorithmic subgroup that can efficiently analyze and forecast large datasets. There are some problems to be addressed, such as improving the accuracy of these algorithms. Methods for finding the best answers and thus improving prediction accuracy while reducing the number of false positives and false negatives can be implemented using optimization methods, such as particle swarm optimization21, genetic algorithms21, or metaheuristic algorithms22. The purpose of this project is to investigate the potential application of machine learning techniques and optimization algorithms to improve decision-making, thereby accelerating the deployment of personalized preventive and therapeutic approaches in the medical field. The work is specialized in cardiac disease prediction.

Moreover, the Late Fusion CNN model aims to improve the prediction accuracy of cardiac disease through the integration of convolutional neural networks and late fusion techniques23. The patient records, which include medical history, ECG signals, and images, enhance the predictive power of the model, leading to improved accuracy in identifying individuals at risk for cardiovascular disease. Thus, in this research study, the literature on using machine learning techniques for the prediction of cardiac disease has been thoroughly reviewed. The data discovery and cleaning algorithm is analyzed and used in the Late Fusion Convolutional Neural Network (CNN) model.

In this study, we introduce the use of late fusion to improve cardiovascular disease (CVD) prediction models while producing accuracy that improves on the routine centralized machine learning models. A comparison and evaluation of this proposed solution with earlier federation-based learning algorithms will be performed on a comprehensive dataset tendered from several visualization centers, each distributed across a wide variety of health conditions. By identifying patients at a high risk for cardiovascular disease (CVD) and assisting in planning the early administration of therapies that reduce the effects of CVD, patient care decisions and personalized medicine can be informed, and patient care can improve.

Figure 1 shows the late fusion convolutional neural network framework to predict cardiac disease. For the subsequent steps, the framework starts from UCI dataset. To ensure that data for analysis are of quality, data processing before analysis involves data cleaning, formatting, and structuring. This stage addresses the inconsistencies and defects that may impact the accuracy of the forecasts. After data pre-processing, the technique is implemented. CNN uses a late fusion model during this phase. Late fusion combines multiple modalities or models to enhance prediction accuracy. CNN is the basic model because it can extract essential data from the input. The architecture then adds CNN and DNN models, fusion, and output layers. The fusion layer combines data from CNN and DNN. However, the output layer uses fused input to make the final prediction. The model’s performance, accuracy, and efficacy are assessed in the last step. The model’s performance can be evaluated using metrics such as accuracy, precision, recall, and F1 score. The CNN framework’s late fusion method will be tested to see if the model can accurately predict heart disease. The methodology employs late fusion to integrate CNN and DNN models for predicting cardiac disease.

Fig. 1.

Fig. 1

Novel heart disease prediction framework based on late fusion across convolutional neural networks.

Our approach has made some of the following contributions:

  1. We recommend the late fusion approach, a technique for fusing information from different modalities at later stages in the CNN architecture. In traditional attention models, each modality shares all the feature extraction layers; however, this process can result in the loss of discriminative information during feature fusion. This approach also makes it possible to introduce extra modalities with less difficulty as the need arises in the future.

  2. We conduct a comprehensive analysis of the late fusion CNN model based on experimental results obtained with the UCI dataset, comparing it with existing approaches that utilize data or perform early fusion. The measures are accuracy, precision, recall, and the F1-score of the model, which enables a thorough assessment of the model.

  3. The confusion matrix properly classified both types of patients, with and without a history of heart disease.

The following section of the study was performed in the following sequence: Section II of this research paper is the literature review, which includes previous studies that have been done on the specific topic that is being investigated. Section III contains the collection of data sets. Section IV of this paper describes the technique used for the present study. Section V gives a general overview of the experiments performed and the conclusions based on the two models developed. The following section, Section VI, is an extensive explanation of the discussion. Finally, the conclusion is presented in Section VII.

Literature review

The researchers developed several medical expert systems utilizing machine learning techniques to identify cardiovascular disease. These systems were subsequently published in scholarly journals, accompanied by further documentation. To improve the accuracy of the findings, a large percentage of the study carefully evaluated the presence of missing data in the dataset, a crucial step in the data preparation stage.

Machine learning techniques by reference10 increased the forecast accuracy for heart illness. Approximately six algorithms utilized datasets from the IEEE and Cleveland data repositories. In the Cleveland dataset, logistic regression outperformed other methods, with an accuracy rate of 90.16%. On the other hand, AdaBoost achieved an accuracy rate of 90%, making it the best algorithm in the IEEE Dataport dataset. Increased accuracy was obtained by integrating all six strategies into a soft voting ensemble classifier. In particular, the accuracy of the IEEE Dataport dataset was 95%, whereas the accuracy of the Cleveland dataset was 93.44%.

In a similar vein11, published a system for inferring an individual’s vulnerability to CVD that uses a combination of deep learning algorithms and feature augmentation methodologies. The data set used in this study has only eleven characteristics. Using Sparse autoencoder technique, feature augmentation was done to extract more features and enlarging the feature space. Compared to other modern processes the Spencer proposed methodologies produce a performance increase of 4.4 percent. This progress is yielding a 90% accuracy rate, which is a significant improvement. Furthermore13, proposed a novel ensemble method, called QMBC, to differentiate between individuals diagnosed as having heart disease and those who did not. The QMBC model makes use of seven different models: logistic regression model, decision tree model, random forest model and K Nearest Neighbor model; naive Bayes model, support vector machine, multi-layer perceptron. Applying this technique in case of datasets with two classes gives perfect results. This study compares the effectiveness of seven individual machine learning models and a voting classifier for the Cleveland, cardiovascular and HD data sets. The researchers applied feature extraction and selection methods to improve the prediction process effectiveness. With a remarkable accuracy rate of 98.36%, the Quantum Monte Carlo Bayesian Classifier (QMCBC)-a model for image classification that fuses two methods, namely Principal Component Analysis (PCA) and Analysis of Variance-proved compelling.

Moreover, the use of explainable AI (XAI) has proven effective in the diagnosis of diseases in healthcare. Another paper by DeepXplainer14, used the integration of XGBoost and a deep convolutional neural network (CNN) with an explainer (SHAP) for lung cancer classification and achieved an overall accuracy of 97.43%, a sensitivity of 98.71%, and an F1 score of 98.08%. Similarly, DiaXplain15, which uses CNN for feature extraction, XGboost for classification, and SHAP for explainability for Type-2 diabetes diagnosis, achieved accuracy, precision, and F1-Score as 98.24%, 95.12%, and 97.50% respectively. In addition, a systematic survey on 105 AI models with Internet of Medical Things (IoMT) data showed the growth of interest in the transparency and interpretability of models for clinical purposes16.

Moreover, the authors of this work25 proposed a novel method for forecasting cardiac arrest, known as asynchronous federated deep learning. This method utilizes a DNN with an asynchronous learning strategy and a dataset related to heart disease. Their dataset has 76 unique features and 303 records. These researchers’ experimental results reveal that, across two datasets, the proposed asynchronous federated deep learning method outperforms the baseline strategy in terms of communication, cost, and model correctness. With the Sync-FL technique, the accuracy obtained for datasets DS1 and DS2 is roughly 0.888 and 0.893, respectively. Conversely, the accuracy achieved with the Async-FL technique for datasets DS1 and DS2 is approximately 0.869 and 0.871, respectively.

In their investigation26, used a hybrid strategy with a 1D CNN. Using information from an online survey, the researchers assembled a substantial dataset and employed feature selection techniques to identify relevant traits. The UCI repository, MIMIC, and MIT-BIH datasets are among the datasets used in this investigation. Compared to other modern machine learning techniques and artificial neural networks, the 1D CNN showed greater accuracy. Accuracy rates for the validation data were 80.1% for non-coronary heart disease (no-CHD) and 76.9% for coronary heart disease (CHD). In an earlier work27, a potent method was offered for successfully identifying cases of heart disease. The dataset used in this study comprises 918 observations, of which 508 cases are classified as having heart illness and 410 as normal. Five classification models—Naive Bayes (NB), Support Vector Machine (SVM), Random Forest (RF), Logistic Regression (LR), and Linear Discriminant Analysis—are compared to determine the most effective model. According to the study, with a mean accuracy of 86.93%, the Random Forest model outperforms all other models as the best predictor of heart disease.

Furthermore, for increased prediction accuracy28, proposed an ensemble model technique that combined the predictive potential of different classifiers’ models. To forecast and identify the recurrence of cardiovascular disease, this study combines five classifier models—SVM, ANN, NB, regression analysis, and RF—using ensemble learning. Records of cardiovascular data from Hungary and Cleveland were obtained from the UCI collection. This database has 76 attributes, of which 14 were selected after removing redundant and superfluous attributes. Every technique they used yielded noticeably better outcomes, with the RF ensemble strategy leading the pack in both accuracy and prediction likelihood. The RF method achieves the highest accuracy on the provided heart disease dataset (98.17%), whereas the regression analysis yields the lowest accuracy among all the deployed algorithms.

Additionally29, used clinical data from individuals to diagnose coronary artery disease using a deep neural network. These authors focused on the dual method, where coronary artery disease (CAD) is first assessed using a deep neural network (DNN)3032. They developed their proposed framework using several techniques, including deep neural networks (DNN) and case-based reasoning (CBR)33. The deep learning-based model achieves the highest prediction accuracy of 96.2%.

Machine learning has recently improved predictive models in various medical fields. A recent study34 has examined methods for feature selection approaches in HCC prediction based on clinical data. This research employed decision trees, neural networks, and SVM, yielding model accuracies of 96%, 97.33%, and 96.00%, respectively. Such high performance demonstrates the effectiveness of machine learning algorithms in early cancer detection and patient management.

In another area, the performance of cavitation treatment has been enhanced by machine learning in predicting abdominal fat35. In this study, multiple regression analysis was employed to predict fat distribution, in conjunction with Hyperopt and Optuna, to optimize the parameters, yielding R-squared values of 94.12% for visceral fat and 71.15% for subcutaneous fat. This demonstrates the effectiveness of high-optimization methods in enhancing the predictive value of post-treatment assessments.

Additionally, deep learning has been enhanced for epilepsy research through feature scaling and the incorporation of dropout layers36. Nine deep-learning models were trained on EEG recordings, and all of them demonstrated relatively high accuracy. BiLSTM and GRU achieved the best outcomes, with 0.983 and 0.984, respectively. This has shown the impact of advanced neural network architectures on seizure identification performance. Finally, predictions regarding the pelvic tilt and lumbar angles were made with the help of machine learning in women with pelvic floor dysfunction37. The algorithms used included decision trees and AdaBoost, with every model achieving a high R-squared value, such as 0.944. For pelvic rotation, the mean R-squared value was 0.976 for the lumbar angle using decision trees. Such a conclusion can be used to describe the success of the machine learning approach as applied to the broad range of musculoskeletal disorders.

Moreover, Abdel et al.38 reviewed changes of core muscles in connection with FSD and the results of various machine learning models; CNN offered the highest performance with the R2 of 0.988. In another study, Hassan et al.39 also improved disease classification through symptom analysis using language models. The presented MCN-BERT model and the AdamW reached the relevant accuracy rates of 98.33% and 95%. We achieved 15% on two datasets, slightly surpassing the performance of BiLSTM with Hyperopt. Furthermore, Abdel et al.40 employed a machine learning approach to investigate trunk displacement in women with postpartum low back pain, achieving 100% accuracy for several indices using the Basic CNN and Random Forest Classifier.

Table 1 presents a comprehensive compilation of prior academic publications that have employed late fusion approaches in conjunction with convolutional neural networks (CNNs) to effectively identify and diagnose heart disease.

Table 1.

List of past references.

References Identifier/dataset Methodology Results
10 Include datasets I and II (UC Irvine ML Repository-Cleveland datasets). There are 302 distinct occurrences, with 164 heart disease patients and approximately 138 without heart disease Machine Learning, RF Classifier, K-Nearest Neighbor Classifier, Gradient Booster, Logistic Regression, Naïve Bayes, Ensemble Classifier Its accuracy ranges from 93.44 to 95%
11

The dataset contains 11 distinct clinical features

It comprises 918 samples, around 508 with heart issues and 410 with healthy class

Deep Learning Method, CNN, MLP, RF, DT, AdaBoost, XGBoost With an accuracy of 90.88%
13

UCI Dataset. Featuring up to 14 features and a total of 76 features

HD Dataset

The CVD dataset comprises approximately 700,000 patient records, 11 attributes, and a target variable

Machine Learning, Voting Classifier, QMBC techniques Achieving an accuracy of 98.36%
14 Open-source “Survey Lung Cancer” dataset processed for analysis DeepXplainer combines CNN and XGBoost with SHAP for explanations Accuracy: 97.43%, Sensitivity: 98.71%, F1-score: 98.08%
15 Data from the National Health and Nutrition Examination Survey (NHANES) used for diabetes diagnosis DiaXplain integrates CNN for feature extraction, XGBoost for classification, and SHAP for explainability Achieved 98.24% accuracy, 95.12% precision, and 97.50% F1-score with transparent, interpretable predictions
16 Analysis of 105 published models using clinical data for diagnosing disorders with IoMT integration Investigation of scholarly articles, AI models, and techniques from 2004 to 2024 using various sources Models classified by input data, methodologies, and IoMT integration; focus on explainability in healthcare
25

DS1 and DS2 datasets were taken from the machine learning libraries at UCI and Switzerland, respectively

The dataset comprises 303 records and includes 76 distinct features

Asynchronous Federated Learning Cardiac Prediction, Deep neural Network (DNN), Machine Learning

Using the Sync-FL method, datasets DS1 and DS2 accuracy is approximately 0.888 and 0.893%, respectively

Using the Async-FL method, datasets DS1 and DS2 accuracy is approximately 0.869% and 0.871%, respectively

26 UCI repository dataset, the MIMIC dataset, and the MIT-BIH dataset Machine Learning, CNN, 1D-CNN Model Accuracy ranges between 76 to 80.1%
27 This study comprised 918 observations, of which 508 had heart illness and 410 were expected Machine Learning, NB, SVM, RF, LR, and Linear Discriminant Analysis Accuracy of 86.93%
28 UCI data repository consists of 76 attributes, with 14 attributes selected Machine Learning, NB, SVM, RF, Regression Analysis, Ensemble Method The best accuracy of 98.17%
29 The medical dataset comprises 335 instances, each comprising 36 clinical attributes Machine Learning, DNN, RF, Linear Regression The accuracy of the deep neural network with Gaussian noise is 96.2%
34 Clinical features of HCC patients Feature reduction, applied machine learning algorithms, performance comparison The employed algorithms, namely decision trees, neural networks, and SVM achieved accuracies of 96%, 97.33%, and 96.00%
35 Abdominal fat measurements and cavitation treatment parameters Regression analysis with Hyperopt and Optuna for predictive accuracy R-squared: 94.12% visceral fat, 71.15% subcutaneous fat
36 EEG recordings from multiple subjects Trained nine deep learning architectures with varying preprocessing techniques Accuracy; BiLSTM: 0.983; GRU: 0.984
37 Core muscle activity in multiparous women with pelvic floor dysfunction Decision tree, SVM, random forest, AdaBoost for predicting pelvic parameters AdaBoost R2 = 0.944; Lumbar angle: Decision tree R2 = 0.976
39 Core muscle changes during FSD, analyzed with machine learning models Evaluated MLP, LSTM, CNN, RNN, ElasticNetCV, random forest, SVR, Bagging CNN achieved the highest R2 of 0.988
40

Dataset-1: 1,200 disease-symptom pairs

Dataset-2: 23,516 ADR/Non-ADR tweets

MCN-BERT with AdamP and AdamW, BiLSTM with Hyperopt

MCN-BERT AdamW: 98.33% (Dataset-1), 95.15% (Dataset-2)

BiLSTM Hyperopt: 97.08% (Dataset-1), 94.15% (Dataset-2)

41 Patient records with pain, range of motion, and movements Basic CNN and Random Forest Classifier analyzed predictive features Perfect accuracy, AUC, precision, recall, F1-score; key features identified

Dataset

Data collection

Heart disease Cleveland UCI dataset

The dataset in question is available to a broad audience via the UCI Machine Learning Repository. David W. Aha, a researcher affiliated with the Cleveland Clinic Foundation’s Heart Disease Research Laboratory, first disseminated the content34.

  • The total number of cases recorded is 303.

  • The dataset consists of 13 features, excluding the target column.

  • The presence or absence of cardiac disease is the study’s primary variable, with 1 indicating the presence of the disease and 0 indicating its absence.

The patient’s gender, age, resting blood pressure, cholesterol levels, type of chest pain, highest heart rate, exercise-induced angina, fasting blood sugar, resting electrocardiogram results, ST depression caused by exercise compared to rest, the slope of the peak exercise ST segment, the number of significant vessels colored by fluoroscopy, and the outcomes of a thallium stress test are just a few of the features that are covered. The goal of applying this data is to develop prediction algorithms that can categorize individuals into two groups based on whether they have heart disease or not.

For this work, the heart disease dataset was taken from the UCI Machine Learning Repository (Table 2). Listed below are its 13 characteristics and 303 occurrences. Using the dataset, the suggested Late Fusion over Convolutional Neural Networks (CNN) is trained and assessed on the dataset divided into a training and test set in an 80:20 ratio.

Table 2.

Dataset of current work.

Age Age of the patient in years Integer 29–77
Sex Gender of the patient (1 = Male, 0 = Female) Binary 0, 1
Chest Pain Type Type of chest pain experienced by the patient Categorized

1 = Typical Angina, 2 = Atypical Angina,

3 = Non-anginal Pain,4 = Asymptomatic

Resting BP Resting blood pressure (in mm lig) on and mission to the hospital Integer 94–200
Cholesterol Serum cholesterol in mg/di Integer 126–56–1
Fasting Blood Sugar Fasting blood sugar > 120 mg/dl (1 = Truer 0 = False) Binary 0, 1
Resting ECG Resting electrocardiographic results Categorized

0 = Normal 1 = abnormality,

2 = Showing probable or definite left ventricular hypertrophy

Max Heart Rate

Exercise-induced Angina

Maximum heart rate achieved

Exercise-induced angina (1 = Year 0 = No)

Integer 71–202
Binary Q 1
Oldpeak ST depression induced by exercise relative to rest Float 0.0–6.2
Slope The slope of the peak exercise ST segment Categorized 1 = Upsloping, 2 = Flat 3 = Doffisloping

Ca Number of major vessels (0–3)

colored by fluoroscopy

Number of major vessels (0–3)colored by fluoroscopy Integer 0–3
Thal The thalassemia status (3 = Normal, 6 = Fixed Defect, 7 = Reversible Defect) Categorical 3, 6, 7
target Diagnosis of heart disease (1 = Disease, 0 = No Disease) Binary 0, 1

An 80:20 split ratio separated the dataset into training and test sets. A table describing each set’s dimensions is shown below in Table 3:

Table 3.

Dataset dimensions.

Dataset portion Number of instances Percentage (%)
Total dataset 242 100
Training set 242 80
test set 61 20

Data description

The Cleveland UCI dataset comprises patients with cardiac issues and heart disease risk evaluations at the Cleveland Clinic Foundation. The information encompasses several elements related to the patient’s medical and demographic characteristics, as well as the presence or absence of cardiac disease. Figure 2 illustrates the whole set of variables contained in the dataset.

Fig. 2.

Fig. 2

All variables in the dataset.

Attributes:

  • Patient’s age in years, in numeric form.

  • Patient’s gender (0: female, 1: male)

  • Cp: chest pain kind (asymptomatic, non-anginal, classic angina, and atypical angina).

  • Trestbps: (numeric) resting blood pressure in mm hg

  • Chol: numeric measurement of serum cholesterol level in mg/dl.

  • Fasting blood sugar exceeds 120 mg/dl (1: true, 0: false).

  • Resting electrocardiogram (resting ECG) findings (0: normal, 1: exhibiting ST-T wave abnormalities, 2: indicating possible or certain left ventricular hypertrophy)

  • Thalach: Achieved maximum heart rate (numerical)

  • Exercise-related angina: exam (1: yes, 0: no)

Table 4, Presented below, pertains to the UCI Heart Disease Cleveland dataset.

  • Oldpeak: Exercise-induced ST depression compared to rest (numeric)

  • Slope: The peak exercise’s slope ST section (upsloping, flat, and downsloping, respectively)

  • CA: Fluorescence-based primary vessel count (0–3) (numeric)

  • Thallium stress test results: normal, permanent defect, and reversible defect (3, 6, and 7, respectively).

  • Condition: The goal variable is the presence or absence of heart disease (0: no illness, 1: heart disease).

Table 4.

UCI heart disease Cleveland Dataset.

age sex cp trestbps treetops chol FBS restecg thalach exang old peak slope ca thal condition
0 69 1 10 160 234 1 2 131 0 0.1 1 1 0 0
1 69 0 0 140 239 0 0 151 0 1.8 0 02 0 0
2 66 0 0 150 226 0 0 114 0 2.6 2 0 0 0
3 65 1 0 138 282 1 2 174 0 1.4 1 1 0 1
4 64 1 0 110 211 0 2 144 1 1.8 1 0 0 0

The main objective of this dataset is to use the given parameters in order to forecast the possibility of getting heart disease. The binary classification is a familiar example of the problem in question.

Further, the data is commonly employed in data mining, machine learning education, and research. The data is a valuable resource for creating and testing prediction algorithms to detect heart disease. It is essential to recognize and consider the specific restrictions associated with each dataset. Since the data were combined in 1988, medical progress has been achieved. The information was gathered from patients who underwent cardiovascular exams at a specific hospital, which may have differed from the characteristics of the overall population. Therefore, a prediction model derived from this dataset cannot be blindly used in a real-world medical setting, as it must be rigorously evaluated and validated using current data on different patient cohorts.

EDA

Data analysis should involve data visualization techniques. In order to further understand the patterns, relationships, and inferences of the data, one needs to visualize the data. Complicated information can be better conveyed and understood through visual aids. Examples of data visualization methods to understand the Heart Disease Cleveland UCI data are numerous. The pair plots were clearly designed to provide a visual representation of the relationships between a large variety of numerical variables. The relationships among numbers are graphically depicted through a correlation heatmap. Facet grid plots are used to explore the influence of aging on the maximum heart rate. Visualizations may provide the necessary understanding as they may help to better understand the properties of data and the relationships among different aspects. They also allow detecting any patterns or trends required to create predictive models.

  • Pair Plot of Numerical Features The pair plot visually represents the relationships between different numerical features in the dataset. Each subplot displays a scatter plot for a combination of two numerical features and the corresponding histograms for each feature on the diagonal.

  • Age vs. Resting Blood Pressure (treetops) There isn’t a clear linear relationship between age and resting blood pressure. However, there is a slightly higher concentration of data points around higher resting blood pressure values for older patients. Figure 3 displays the pair plots of specifically chosen numerical features.

  • Age vs. Cholesterol (chol) Like the previous pair, age and cholesterol levels don’t have a strong linear correlation. Cholesterol levels show some variations across different age groups.

  • Age vs. Maximum Heart Rate (thali) Age and maximum heart rate correlate negatively. Younger patients generally achieve higher maximum heart rates during exercise than older patients.

  • Age vs. ST Depression (old peak) The pair plot suggests no clear linear relationship between age and ST depression. There are variations in ST depression values across different age groups.

  • Resting Blood Pressure (treetops) vs. Cholesterol (chol) There is no solid linear correlation between resting blood pressure and cholesterol levels. The data points are scattered with no apparent pattern.

  • Resting Blood Pressure (treetops) vs. Maximum Heart Rate (thali) There is no apparent linear correlation between resting blood pressure and maximum heart rate. The scatter plot shows no specific pattern or trend.

  • Resting Blood Pressure (treetops) vs. ST Depression (old peak) There is no significant linear relationship between resting blood pressure and ST depression values.

  • Cholesterol (Chol) vs. Maximum Heart Rate (thali) The scatter plot between Cholesterol and maximum heart rate indicates no apparent linear correlation.

  • Cholesterol (Chol) vs. ST Depression (old peak) The pair plot shows no linear relationship between cholesterol levels and ST depression.

  • Maximum Heart Rate (thali) vs. ST Depression (old peak) There seems to be a negative correlation between maximum heart rate and ST depression. Patients with higher maximum heart rates tend to have lower ST depression values.

Fig. 3.

Fig. 3

Pair plots of selected numerical features.

The pair plot facilitates the visualization of the associations between numerical information and enables the identification of probable correlations or patterns. Nevertheless, it is imperative to remember that this analysis may not yield conclusive inferences regarding causality or robust associations. The tool serves as an exploratory instrument, facilitating subsequent investigation and model selection.

Correlation heatmap of numerical features

The target variable "condition," which indicates the presence or absence of heart disease, is included in the correlation heatmap, which visually displays the pairwise correlations among the selected numerical properties. The statistical measure known as the correlation coefficient ranges from -1 to 1. A perfect positive correlation is represented by a value of 1, a perfect negative correlation by a value of -1, and no correlation is indicated by a value of 0. The correlation coefficient between age and the dependent variable ‘condition’ is relatively low, at roughly 0.23, suggesting that age alone may not be a reliable indicator of heart disease. However, it is noteworthy that there is still a weak relationship, indicating a possible correlation between advancing age and a higher risk of cardiovascular disease.

The correlation coefficient between the condition under investigation and resting blood pressure (treetops) is roughly 0.15, which is relatively weak. This suggests that resting blood pressure alone may not significantly influence the chance of heart disease development. The correlation heatmap for the selected numerical characteristics is shown in Fig. 4.

  • Cholesterol (chol) and Condition The correlation between cholesterol levels and ‘condition’ is again weak (around 0.08), indicating that cholesterol levels alone may not strongly predict the presence of heart disease.

  • Maximum Heart Rate (thali) and Condition The correlation between maximum heart rate and ‘condition’ is negative, with a value of approximately -0.42. A higher maximum heart rate achieved during exercise is associated with a lower risk of heart disease.

  • ST Depression (old peak) and Condition The correlation between ST depression and ‘condition’ is moderately strong (around 0.42). A higher ST depression induced by exercise relative to rest is positively associated with the presence of heart disease.

  • Condition (Target) and Itself As expected, the correlation between the target variable ‘condition’ and itself is 1. This serves as a reference point for understanding how other features relate to the presence or absence of heart disease.

Fig. 4.

Fig. 4

Correlation heatmap of selected numerical features.

Analysis of the correlation heatmap reveals patterns in the associations between numerical attributes and the goal variable, ‘condition’, which represents the presence or absence of heart disease. The strongest connections between the existence of heart disease and the maximal heart rate attained during exercise (maximal heart rate) and the exercise-induced ST depression (ST depression) suggest that these variables may be crucial for predicting heart disease. To increase prediction accuracy, combining specific data with others, such as features with comparatively weaker correlations, may be necessary, including age, resting blood pressure, and cholesterol levels.

Chest pain type and heart disease presence

The stacked bar plot in Fig. 5 visualizes the distribution of different chest pain types in relation to the presence or absence of heart disease (‘condition’). The x-axis represents the four types of chest pain, and the y-axis represents the count of patients.

  • Typical Angina (Chest Pain Type 1) The majority of patients who experienced typical angina (cp = 1) do not have heart disease (condition = 0). However, many patients with typical angina also have heart disease (condition = 1).

  • Atypical Angina (Chest Pain Type 2) Patients with atypical angina (CP = 2) exhibit a more balanced distribution between the presence and absence of heart disease. The number of patients with atypical angina with heart disease (condition = 1) is slightly lower than those without (condition = 0).

  • Non-anginal Pain (Chest Pain Type 3) The majority of patients with non-anginal pain (CP = 3) have a high contribution from heart disease (condition = 0). The number of patients with non-anginal pain and heart disease (condition = 1) is comparatively large.

Fig. 5.

Fig. 5

Types of chest pain and presence of heart disease.

The stacked bar plot suggests that the type of chest pain is associated with the presence of heart disease, although it is not a definitive indicator. While some chest pain types show a higher association with heart disease (Non-anginal pain), others have a more balanced distribution (atypical angina). The non-anginal pain type has an influential association with heart disease. However, chest pain type and other features must be considered together for a more accurate prediction of heart disease.

Age and Maximum Heart Rate Achieved The relationship between age and maximum heart rate attained is visually illustrated by the facet grid. Our ability to compare how the maximal heart rate fluctuates across several age groups is possible because each facet (subplot) corresponds to a particular age group.

The maximal heart rate that may be obtained generally declines slightly with age. The majority of the age groups represented in the facet grid exhibit this trend. The distribution of data points for each age group is displayed in the scatter plots within each facet. The maximal heart rate variations for people within the same age group are shown by points spread across the plots.

Figure 6 shows a facet grid to illustrate the addition of the age and the maximum heart rate achieved. The data show that people in the younger age groups (e.g., 29, 34, and 37) have a higher range of maximum heart rates achieved than those in the older age groups (e.g., 71, 74 and 76). Using the aspect grid provides important data about how age relates to the achievement of peak heart rate. The statistics show that there is a tendency towards a decrease in maximal heart rate over the age. In addition, this observation demonstrates the great variation in the highest heart rate achieved by different people of each age group, which suggests that there are other variables beyond age that may influence the attainment of peak heart rate. The maximum heart rate of a person must be taken into account as it depends on the physical conditioning, personal habits and on the medical diseases.

Fig. 6.

Fig. 6

Facet grid-age and maximum heart rate achieved.

Type of chest pain and achieved maximum heart rate

Figure 7 represents the swarm plot of individuals’ specific forms of chest pain and their maximum heart rates. This study presents numerous data points on different forms of chest pain, enabling us to examine how maximal heart rates are distributed and whether any trends or inconsistencies exist based on the specific type of chest pain. To visualize the highest heart rates associated with each type of chest pain, the swarm plot displays the highest heart rates for each type, with each data point corresponding to a distinct type. Some of the individuals with “typical angina” have peak heart rates that fall in a range between around 100 and 190 bpm.

Fig. 7.

Fig. 7

Chest pain type and heart rate achieved swarm plot.

The maximum heart rates usually fall between 120 and 170 beats per minute (bpm), with a maximum variation in heart rate of about 140–170 bpm. The chest pain data tends to cluster around atypical angina. The heart rate varies widely in a group of patients with so-called nonanginal pain: 100 to 180 beats per minute. The highest concentration is found in the heart rate of 140 to 150 beats per minute. A swarm plot is used to compare and contrast the expressions of chest pain, thereby illustrating the differences in the highest rates of heart rate that can be attained in each group. Certain forms of angina, including typical angina, may have a broader spectrum of maximal heart rates. Conversely, certain forms, e.g., atypical angina and non-anginal pain, present, as a rule, more closely clustered symptoms. The story suggests that individuals presenting with various types of chest pain may react to changes in heart rate in different ways. People who appear with what is classified as "non-anginal pain" or classic angina may show a broader range of heart rates, which could indicate a more diverse range of underlying cardiovascular diseases. Conversely, patients who report "non-anginal pain" and describe their chest pain show higher consistency in their maximal heart rate, which may indicate a possible correlation with more uniform cardiac issues.

The slope of peak exercise ST segment and heart disease

Figure 8 illustrates visually the distribution of the slope of the peak workout ST segment between people with and without cardiovascular disease. By examining the comparative frequencies of the different types of slopes in each group, we can test possible relationships between slope features and the incidence of heart disease.

Fig. 8.

Fig. 8

Slope of peak exercise st segment and heart disease.

The topic of interest is the slope of the ST segment during peak exercise. The count plot displays three distinct groups, representing the slope of the peak exercise ST segment: "upsloping," "flat," and "downsloping."

The bars in the graph depict the frequency of individuals without cardiac disease for each slope category. The plot displays the distribution of slope categories observed in individuals in good health. The bar graph shows the distribution of heart disease patients across different types of slopes. Figure 8 depicts the distribution of heart disease category (slope) values among people diagnosed with the disease.

This figure demonstrates the patterns of prevalence of the various slope types among individuals diagnosed with heart disease. As an example, when there is a greater percentage of one type of slope in the group of people with heart disease than in the group without heart disease, the slope type may represent a potential relationship with heart disease. Similarly to the cohort of heart disease-free individuals, the preponderance of one type of slope versus another in the cohort of heart disease patients may indicate a reduced number of these patterns among heart disease patients.

The count plot allows for visual observation of the differences between individuals with and without heart disease, as well as the number of each present in the database. These data tell us which of these groups have patients with which slope types and show a way in for possible correlations between different slope patterns and CV disease. In patients, the flat slope of type plays a significant role in causing and progressing heart disease. Electrocardiograms (ECG) of an unsloping nature have a high incidence in people who do not have heart disease. The pattern of downsloping is balanced to some extent, with individuals with heart disease being distributed, on average, in a similar frequency to those without heart disease.

Cholesterol and maximum heart rate achieved

The joint plot is a graphical representation of the relationship between a patient’s cholesterol level and their maximal heart rate. By presenting a combined representational scatter plot with marginal histograms, the tool assesses the relationship between the two variables, as well as their distributions in relation to each other.

The patient’s cholesterol levels are plotted on the horizontal axis, and the maximum heart rate that patients can achieve is shown on the vertical axis, or y-axis. The data points are distributed as indicated by the central scatter plot, with each point representing the maximal heart rate and cholesterol level of a single patient. The histograms at the top and right borders of the graph show the frequency distribution of cholesterol levels and the maximal heart rate attained.

Figure 9’s scatter plot illustrates the underlying pattern reflected in the data points. A possible correlation between cholesterol levels and the highest heart rate reached can be inferred from a continuous pattern of increase or decrease in the data points. A positive correlation indicates a statistical relationship between increased maximal heart rates and raised cholesterol levels. A negative correlation, on the other hand, denotes an inverse relationship between these variables. The histograms in the margins provide essential information about the distribution patterns of cholesterol levels and the maximum heart rate reached.

Fig. 9.

Fig. 9

Joint plot—cholesterol and maximum heart rate achieved.

The histograms’ form and dispersion reveal information about the data distribution of each variable. A positive correlation between maximal heart rate and cholesterol levels may suggest that individuals with higher cholesterol levels may reach a higher maximum heart rate during physical activity. A negative correlation, however, would indicate that people with high cholesterol also tend to have lower peak heart rates when exercising physically. Using joint plots can be helpful in identifying correlations between a person’s cholesterol levels and their highest heart rate, which can be beneficial for medical professionals.

EDA analysis

The visual representations of the Heart Disease Cleveland UCI provided several insights into different aspects of heart disease prediction. It provided a thorough analysis of the relationships between specific numerical attributes. The diagonal scatter plots, as displayed, showed the distributions of the different traits. In contrast, the off-diagonal scatter plots, displayed on the scatter plots, illustrated the relationships between the various features of the outcome. To select or engineer specific features during model creation, the pair plot was used to approximately determine any linear or non-linear relationships between particular features.

Visualization of the pairwise connection between numerical attributes was performed using a correlation heatmap. Many variables are strongly correlated, thus indicating the presence of multicollinearity and redundancy in the dataset. The existence of multicollinearity should be considered, as well as the relevance of features when creating and building a model.

Therefore, a facet grid display was used to show how peak heart rates changed with age. For each age group, discrete clusters consequently appeared on the plot, which may indicate trends in maximum heart rate achieved with age. The idea was that people with chest pain might have unequal maximum heart rates, and that nonanginal pain, as well as other asymptomatic pain, has a greater maximum heart rate.

The comprehensive knowledge of the dataset was exploited to support visualizations that would assist in the detection of possible patterns or correlations with heart disease. These insights are significant as they enable a greater specificity of the cardiac disease prediction results and allow one to select features to develop, tailor, and optimize models. It is suggested that more data exploration and feature engineering are needed in order to generate reliable prediction models to diagnose and prognosticate heart illnesses.

Methodology

Late fusion over convolutional neural networks

One compelling deep learning technique for later fusion across convolutional neural networks is to combine data from multiple sources or modalities in an efficient manner, thereby improving prediction accuracy. According to this method, specialized models are first used to analyze data separately, and then an evaluation of all incoming data is performed simultaneously. Finally, several models produce, combine, or integrate the results to make the final prediction. The fundamental aspect of late fusion is that precise model builds are created for specific data modalities. A CNN can be designed precisely to evaluate the image input and extract some valuable features. On the other hand, it is possible to develop a highly specialized DNN to process tabular data and learn complex, specialized patterns. Such expert models can focus on the unique features of given data modalities and generate representations that precisely identify the most important patterns. The CNN model is based on convolutional layers for extracting spatial characteristics from the data, which is passed into 2D matrices (images). The subsequent model is the one that can represent the output of the CNN model.

  • Ycnn = CNN (xing)

    The DNN model is composed of densely connected layers that allow a tabular input of numerical features and can detect complex correlations. In this, the output of the DNN model can be represented through the following mathematical model.

  • Ydnn = DNN(xtab)

    Ximg is an input such as an image, and xtab is an input like a table presented to the CNN and DNN models. Hence, the fusion layer concatenates the outputs of the CNN and DNN models to integrate the data obtained from both branches. The fusion layer is explained through the provided visual representation.

  • Yfusion = concatanate(ycnn,ydnn)

    Based on this DNN, we then derive the sequential patterns for the late fusion model, which utilizes the sequential patterns obtained from the DNN and the spatial information derived from the CNN. By combining the input from both branches, the model achieves a greater capacity to learn the finer patterns in the data and, therefore, exhibits better predictive performance. A concatenation of the feature representations is passed through a dense layer to further process the fused data. The illustration depicts a thick layer.

  • ydense = dense(yfusion)

    The last layer of the heart disease classification model is the output layer, which employs a sigmoid activation function for the binary classification of the disease’s presence or absence. The model is represented as follows with the outcome of the model:

  • yout = sigmoid(dense)

After being processed independently, the results from different models are merged or consolidated, yielding a single, fused representation of the data that incorporates all information from the various data modalities. The derived fused representation is then sent through layers, such as fully connected layers, to get the final forecast.

As such, the late fusion technique is beneficial when dealing with complex datasets that contain multiple types of data or data modalities. This enables the efficient integration of multiple information sources, including text, images, and quantitative data, to optimize the overall performance of the combined model. Late fusion models are more accurate and robust than single-modality models because they can combine the best features of multiple models, as well as various data modalities.

Late fusion has been successfully applied in various fields, including convolutional neural networks in computer vision, natural language processing, medical image analysis, and multimodal data analysis. In this sense, the method is a powerful tool that deep learning processes can use due to its high level of adaptability and flexibility. The fact that the tool can be effectively utilized in various activities and data sources solidifies its significance as a versatile tool. Late fusion models can, therefore, become successful only if carefully designed to achieve a successful combination of data from multiple sources. Like any other machine learning method, this approach involves hyperparameter optimization and effective data preprocessing. The general structure of late fusion is illustrated in Fig. 10.

Fig. 10.

Fig. 10

Late fusion general architecture.

Model design

CNNs can be employed for late fusion by integrating data from multiple modalities or sources into a single neural network model. This method utilizes a specialized neural network design to evaluate each data modality independently and then combine all the model results to produce its final prediction.

In the Cleveland UCI dataset, a late fusion technique is efficiently used to handle cardiac conditions.

  • Tabular data Tabular data, also known as structured data, refers to traditional data organized in a tabular format, consisting of numerical attributes such as age, blood pressure, and cholesterol levels, among others.

  • Image-like data Image-like data refers to converting tabular data into representations that resemble grayscale images. It involves transforming numerical attributes into two-dimensional matrices. The tabular data is transformed into a two-dimensional image by representing the features on one axis and the number of samples on the other.

The late fusion process encompasses a series of phases:

  • Data Preparation The data preparation process involves dividing the dataset into distinct classes, which represent the presence or absence of cardiac disease, and characteristics, which consist of tabular data. The training, validation, and testing datasets are divided into tabular attributes. The StandardScaler is employed to normalize the tabular characteristics, generating image-like representations of the tabular features. Table 5, depicted below, presents the sample sizes.

  • CNN Model A CNN model is developed to analyze tabular data that exhibits visual characteristics similar to images. Convolutional layers in the CNN architecture extract hierarchical patterns from image-like input, whereas pooling layers subsequently downsample the feature maps. The CNN model is utilized to analyze image-like data and extract relevant information. Convolutional layers serve as the fundamental building blocks of CNN architecture, enabling the acquisition and extraction of hierarchical patterns from input data that resembles images. The convolutional layers have many filters or kernels that traverse the 2D matrices to perform convolution operations. Every filter detects a specific attribute or arrangement within the data resembling a picture, such as edges, textures, or shapes. The feature maps generated by the convolutional filters highlight the spatial distribution of patterns within the input matrices.

  • DNN Model A different classical DNN model is created to handle the tabular characteristics. Fully interconnected layers comprise the DNN architecture, which uses tabular data to discover complicated connections. There are often several layers that are ultimately linked in the DNN model. Each neuron in a fully connected layer receives information from every neuron in the layer above and calculates a weighted sum of these inputs. Following the introduction of non-linearity through the activation function, the weighted sum enables the model to train on complex interactions.

Table 5.

Samples sizes.

Training sample size Testing sample size Validation sample size

(282,13) Features

(282) Classes

(14,13) Features

(14) Classes

(14,13) Features

(14) Classes

The model architecture design for late fusion is presented in Fig. 11.

  • Fusion Layer The fusion layer is formed by summating the outputs from the CNN and the DNN models. This layer facilitates the integration of the representations derived from the image-like and tabular data. The concatenated feature representation encompasses the high-level patterns extracted from the image-like data and the relationships identified from the tabular data. The fusion layer facilitates the integration of the connections identified by the DNN in the tabular data with the high-level spatial patterns discovered by CNN. By integrating spatial and sequential propagation, the model boosts the predictive power of cardiac disease in forecasting.

  • Output Layer The output layer of the neural network model utilizes a dense layer with a sigmoid activation function to classify individuals into two classes: those with and those without heart disease, based on the presence or absence of the disease. The final result of the model is the chance that heart disease exists for a given sample.

  • Training and Evaluation Its Training and Evaluation process involves building the late fusion model using an appropriate optimizer, e.g., Adam, and a binary cross-entropy loss function. To help mitigate the issue of overfitting, the training process combines the validation data with the dataset (which is concatenated) as image-like features and tabular features. After the training phase is completed, the model is evaluated by being tested against a test set of its own, designated to assess the predictive power of the model.

Fig. 11.

Fig. 11

Model architecture design late fusion.

With the help of late fusion techniques, CNNs can be used to incorporate more information from multiple data modalities that the model can benefit from. Late fusion models can enhance prediction precision and generalization by integrating image–like and tabular data. To achieve maximum performance in predicting heart disease using the Heart Disease Cleveland UCI dataset, the model architecture, hyperparameter choices, and data representations need to be refined.

Model architecture design

For the binary classification of heart disease presence, the model employs a late fusion approach that combines CNN and DNN. A description of the model’s design is provided.

  • Data Preparation
    • The dataset is divided into three distinct sets: training, validation, and testing.
    • The dataset’s numerical features are scaled using the StandardScaler method.
    • The attributes transform using the reshape function to construct two-dimensional matrices for CNN input, resulting in representations that resemble images.
  • The CNN Model
    • The primary purpose of developing the Convolutional Neural Network (CNN) model was to efficiently process and analyze image data.
    • Following a 1-dimensional convolutional layer with 512 filters and a kernel size of 2, a Rectified Linear Unit (ReLU) activation function is applied. The convolutional layer can acquire hierarchical patterns from the input, represented in an image-like format.
    • The output of the convolutional layer is subjected to downsampling through the utilization of a MaxPooling1D layer, which employs a pool size of 2. The salient characteristics within the feature maps are retained, while the spatial dimensions are reduced.
    • To mitigate the risk of overfitting, a dropout layer with a dropout rate of 0.1 is incorporated into the model architecture. The dropout technique randomly deactivates a subset of neurons throughout the training process. This deliberate deactivation decreases the model’s dependence on individual neuron activations, improving its ability to generalize.
    • A one-dimensional vector is generated by compressing the output of the convolutional layers to facilitate its input into the subsequent dense layers.
    • The final output of the CNN model is a feature vector encompassing the relevant patterns extracted from the image-like input.

Table 6, presented above, displays the characteristics and configurations of the CNN model employed within the late fusion model.

Table 6.

CNN model properties.

Layer Output shape Parameters
Conv 1D (None,12,512) 1536
Maxpool ID (None,6,512) 0
Dropout (None,6,512) 0
Flatten (None,3072) 0

The model architecture design is depicted in Fig. 12 above.

Fig. 12.

Fig. 12

Model architecture design.

  • The DNN Model
    • The DNN model processes the numerical characteristics of the dataset.
    • A Dense layer with 128 units and a ReLU activation function forms the foundation. The model can learn intricate correlations and representations from the tabular data thanks to the thick layer.
    • For regularization, a dropout layer with a rate of 0.1 is introduced. By randomly deactivating neurons during training, dropout helps prevent overfitting.
    • The second dense layer is placed following batch normalization. The activations are normalized during batch normalization, which promotes quicker convergence and lessens internal covariate shift.
    • The robustness of the model is ensured by using a second dropout layer with a rate of 0.5.

Table 7 shows the properties of the DNN model being applied in the Late Fusion Model. Both models will be concatenated in the fusion layer, and the output will be based on spatial and tabular features.

Table 7.

DNN model properties.

Layer Output shape Parameter
Dense (None,128) 1792
Dropout (None,128) 0
Batch normalization (None,128) 512
Dropout (None,128) 0
Dense (None,30) 3870
Batch normalization (None,30) 120
Dropout (None,30) 0
  • Fusion Layer
    • A fused representation is produced by concatenating the outputs of the CNN and DNN models.
    • The CNN excels in extracting associations and correlations between variables represented as 2D matrices from image-like data and learning hierarchical spatial patterns. On the other hand, the DNN excels in deciphering the complex relationships and sequential patterns seen in tabular data.
    • A thick layer with 64 units and ReLU activation is applied for feature fusion.
    • In this stage, the correlations found in the tabular data are combined with the high-level patterns created from the image-like data.
  • Output Layer:
    • The final output layer consists of a single neuron with a sigmoid activation function for binary classification (heart disease present or absent).
    • Model assemblage and instruction:
    • The Adam optimizer and binary cross-entropy loss function are used to construct the late fusion model.
    • Using the EarlyStopping callback, the model is trained using the validation data for early stopping and the training data.
  • Model Compilation
    • The Adam optimizer, an adaptive learning rate optimization technique, is used to create the late fusion model. It facilitates quicker convergence and works well with large datasets.
    • The binary cross-entropy loss function is employed because binary classification problems require it.
    • Using the EarlyStopping callback, the model is trained using the validation data for early stopping and the training data. When the validation loss stops decreasing, the training process is abruptly halted to prevent overfitting.
    • The number of 512 filters used in the CNN was chosen so that there will be enough capacity to capture a wide range of spatial patterns within the image-like representations of the tabular features to enable the model to learn hierarchical correlations without underfitting. The CNN had a dropout rate of 0.1 which was chosen to mildly regularize the convolutional layers and minimally leaks out most of the spatial feature’s information but makes the neural processing less dependent on individual neurons. By comparison, the DNN branch employed a dropout rate of 0.5 to overregularize the fully connected layers, which are subject to overfitting because they are heavily parameterized. These principles were reasoned out based on trial and error and performance validation, trying to make a trade-off between the model complexity and generalization, greatly avoiding overfitting but with high predictive accuracy.
  • Evaluation
    • To determine the model’s accuracy and loss, a test set is used to evaluate it.
    • The late fusion model is used to make predictions on the test set.
    • The late fusion model leverages both the capabilities of DNNs to handle numerical characteristics and the strengths of CNNs in processing image-like input, thereby providing a more comprehensive representation for the prediction task. The model’s capacity to generalize is ensured by including batch normalization and dropout layers, which prevent overfitting.
  • Accuracy
    • The most accessible indicator is accuracy, which measures the percentage of accurate forecasts among all guesses. It provides a comprehensive overview of the model’s performance across the entire dataset. Accuracy is calculated as follows:
      graphic file with name d33e1906.gif
  • Precision
    • Of all optimistic predictions the model generates, precision is the percentage of true positive predictions (properly anticipated heart disease cases). It demonstrates the model’s ability to effectively avoid producing false positives.
      graphic file with name d33e1916.gif
  • Sensitivity or True Positive Rate of Recall
    • Calculating recall involves dividing all real positive instances in the dataset by the fraction of genuine positive predictions. It shows how well the model can accurately recognize every optimistic scenario.
      graphic file with name d33e1926.gif
  • F1 Score
    • The harmonic mean of recall and accuracy is the F1 score. It is helpful when the class distribution is unbalanced and strikes a balance between accuracy and recall.
      graphic file with name d33e1936.gif
  • Confusion Matrix
    • The confusion matrix is a table that compares the model’s predictions to the actual data. Its four values are True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN).

Results

Model performance analysis

The model showed excellent accuracy and low loss on the training and validation datasets. It shows that the model can produce precise predictions and has successfully located the underlying patterns in the dataset. The model’s remarkable accuracy—a flawless 1.0, or 100%—on the test and validation sets indicates that it can appropriately generalize to data that hasn’t been seen before. However, even when the model exhibits remarkable performance on the validation set, care must be taken to avoid overfitting. One effective method for mitigating the problem of overfitting is to employ early stopping during the training phase, specifically by utilizing the EarlyStopping callback.

Based on achieved results after model training at 100 epochs

  • Training Performance
    • Loss: 0.0243
    • Accuracy: 0.9929
  • Validation Performance
    • Loss: 0.0536
    • Accuracy: 1.0
  • Test Performance
    • Loss: 0.0284
    • Accuracy: 1.0

The model’s performance is illustrated in Fig. 13, which shows the training and validation accuracy. The model exhibits no discernible overfitting, as evidenced by the slight difference between the training and validation accuracy. Notably, and this is a good sign, the validation accuracy shows a somewhat higher number than the training accuracy. A well-calibrated model generates predictions that closely match the real labels with the least loss on the training and validation datasets. The test set results show that the model performs very accurately even when given data that it hasn’t seen during training. This result implies the model is trustworthy and has good generalization capabilities for fresh, untested data. On the other hand, Fig. 14 illustrates the model’s performance in terms of validation and training losses.

Fig. 13.

Fig. 13

Model training and validation accuracy performance.

Fig. 14.

Fig. 14

Model training and validation loss performance.

The detection of heart pathology has proven to be very effective when DNN and CNN models are combined by late fusion. The model achieves almost perfect accuracy and negligible loss on both the training and test datasets, demonstrating its high reliability and robustness. Dropout and batch normalization layers were carefully applied, resulting in a well-performing system, as these layers effectively reduced the system’s overfitting. Therefore, it is important to understand how well the model does on live data and in other ways besides this metric for an accurate understanding of how effective it is in real life.

Additionally, our experiment requires approximately 10 min for late fusion model training. It includes the amount of time needed to train the model, as described in our experiments section after we spent considerable time tweaking the parameters. We applied hyperparameter tuning followed by model fine-tuning, which increased the accuracy and efficiency of the process while reducing computational costs. The first thing to note about the training is that it lasts 10 min. Since there is an optimal way of working to achieve a given performance without incurring non-desired resources, it is demonstrated that the late fusion approach is efficient in detecting heart pathology with nearly 100 percent accuracy and negligible loss. However, the real-world performance of the model’s use needs to be further evaluated. Nevertheless, the model features improved robustness through the use of dropout and batch normalization.

Model evaluation

The model evaluation results indicate that the late fusion strategy, applied to a CNN, has proven to be incredibly effective in precisely predicting the incidence or non-occurrence of cardiac conditions. Table 8 illustrates how the assessment metrics demonstrate the model’s accuracy and dependability in forecasting results on the training and validation datasets.

Table 8.

Evaluation metrics of the proposed model.

Evaluation metric Performance value
AUC 1,0
accuracy 0.99
precision 1.0
recall 1.0
F1 score 1.0
  • Accuracy: 1.0000

Accuracy refers to the proportion of correct predictions among the total number of predictions made. The model successfully classified each sample in the test set, achieving a perfect accuracy of 1.0000 (100%) and demonstrating its strong predictive capability.

  • Precision:1.0000

Precision can be defined as the proportion of accurate optimistic predictions, which refers to the number of correctly predicted cases of heart illness divided by the total number of positive predictions, encompassing all forecasted cases. The model’s accuracy in identifying individuals with genuine cardiac illness is evidenced by its 1.0000 (100%) precision, indicating that all identified positive instances are true positives.

  • Recall: 1.0000

The recall metric, also known as sensitivity, quantifies the proportion of true positive samples (i.e., all instances of heart disease) that were accurately classified. A recall rate of 1.0000 (100%) indicates that the model successfully identified all positive instances, demonstrating its ability to detect all patients with heart disease accurately.

  • F1 score: 1.0000

The F1 score, a crucial metric in imbalanced datasets, is the harmonic mean of precision and recall. An F1 score of 1.0000 (100%) without errors indicates that the model exhibits well-balanced performance on both positive and negative instances, achieving complete accuracy and recall. Figure 15, depicted below, presents the model evaluation metrics: loss, accuracy, precision, recall, and F1 score.

Fig. 15.

Fig. 15

Model evaluation metric results.

Advancements in model performance

Based on the evaluation of various models as presented in Table 9, it is clear that our paper’s solution outperforms all the referenced studies in all the parameters. The result comparison is done according to the Area Under the Curve (AUC), Accuracy, Precision, Recall, and F1 Score only.

Table 9.

Performance parameters comparison.

References AUC Accuracy Precision Recall F1 Score
35 0.95 0.98 0.98 0.97 0.97
36 0.98 0.98 0.99 0.98 0.98
37 0.96 0.96 0.97 0.96 0.96
38 0.97 0.98 0.98 0.97 0.97
39 0.99 0.98 0.99 0.98 0.98
Proposed work 1 0.99 1 1 1

All the above-mentioned models show better results than all other approaches, with the highest AUC of 1.0. This metric indicates that among all the methods proposed, our model has outstanding performance on the classification between classes, which is an enhancement over the highest AUC of 0.99 reported by39. Likewise, the model proposed herein was found to have an Accuracy of 0.99, which is even higher than the second-best accuracy score of 0.98 these are reported in36,38,39. It is equally important since it measures the generalization error of the model.

As for the Precision, Recall, and F1 Score, our model takes the highest score of 1. All three above values were zero, indicating that the company was not involved in any donation, sponsorship, or investment in any social issues highlighted in the study. Accuracy and Overlap are also critical while using Precision and Recall because they determine the ability of the model to correctly identify the positive cases in the right detail. A ‘Precision’ and ‘Recall’ of 1.0 means that our model never commits a ‘False Positive’ or a ‘False Negative,’ which is better than the ‘best’ score of 0.99 Precision and 0.98 Is recalled from36,39.

Our model performs better because of the algorithm’s sophistication or a better way of processing the data to predict the results better. As the results reveal, the envisaged enhancements in all the performance parameters justify the efficacy and reliability of our proposed model.

Confusion matrix

The model’s predictions are organized and recorded as a confusion matrix, as shown in Fig. 16. The subject under consideration encompasses four distinct values.

  • True Positive (TP): A TP refers to the count of heart disease cases that have been correctly predicted, indicating situations where the prediction aligns with the actual positive occurrences.

  • TP = The variable TP represents the count of samples correctly predicted as positive and subsequently tested positive.

  • True Negative (TN): The true negative (TN) refers to the accurate prediction of cases that do not involve heart disease.

  • TN = TN refers to the count of negative samples that have been inaccurately forecasted.

  • False-positive (FP) cases refer to instances in which individuals are incorrectly classified as not having a cardiac illness, sometimes called false alarms.

  • FP = The variable FP represents the count of falsely anticipated positive samples from the set of negative samples.

Fig. 16.

Fig. 16

Confusion matrix heatmap.

It indicates that 9 out 0f 9 test data samples belonging to no disease class were correctly predicted, and 6 out of 6 test data samples belonging to the Disease class were correctly predicted.

  • False Negative (FN) cases were misclassified as not having cardiac disease (missing cases or Type II mistakes).

  • FN = number of samples accurately predicted as positive but positive.

As seen in the confusion matrix:

  • (True Positive) TP: 9

  • (True Negative) TN: 6

  • False Positive (FP): 0

  • False Negative (FN): 0

False positives and negatives are absent, proving that the model correctly classified positive and negative classes with no mistakes.

ROC curve

The Receiver Operating Characteristic (ROC) curve in Fig. 17 depicts the model’s performance at different classification criteria. The intrinsic trade-off between the true positive rate—also referred to as sensitivity or recall—and the false positive rate—the complement of specificity—is brought to light by modifying the classification threshold. A true positive rate of 1.0 (100%) and a false positive rate of 0.0 (0%) would characterize an ideal classifier. As a result, in this case, the receiver operating characteristic (ROC) curve would reach the upper-left corner.

Fig. 17.

Fig. 17

AUC-ROC curve.

A perfect Area Under the Curve (AUC) value of 1.0 on the Receiver Operating Characteristic (ROC) curve indicates that, in the current scenario, the late fusion over the CNN model exhibits flawless discriminatory capability. With a true positive rate of 1.0, the model successfully identified every case of heart disease. Furthermore, it successfully identified all instances of non-heart disease with a false-positive rate of 0.0.

Given a perfect algorithm with a receiver operating characteristic (ROC) whose area under the curve (AUC) is 1.0 anywhere on the curve. Thus, the results are optimal for discriminating between the two groups: those with and without heart disease. The high AUC score of the model proved that we have a valid model that can be used as a diagnosis tool for obtaining an accurate and consistent diagnosis of heart disease.

The results demonstrate the ability of the model to produce perfect accuracy, precision, recall, and F1 score in predicting heart conditions. The model’s accuracy is validated by using the confusion matrix that shows how the model is able to predict whether the person with or without heart disease has it or not (true positives and true negatives). From the result of the study then it is revealed that there have been no cases of false positives and false negatives, so the model can be a reliable diagnostic tool for heart disease and has a high accuracy.

Cross dataset evaluation

Even though the model is 100% accurate on the training, validation, and test sets, it does not always mean that it overfits. When designing the model, several regularization techniques were used to provide model generalizability. Both CNN and DNN branches contain dropout layers, which randomly disable neurons in training, making them less dependent on a single feature. The Batch normalization stabilizes learning and reduces the internal covariate shift, whereas the EarlyStopping callback watches the validation loss to stop the training process when there is no further improvement of the performance, and avoids the unwarranted overfitting of the training set.

In addition, the three external datasets, such as the Cleveland, Hungarian, and Statlog heart disease datasets, were used to validate the model and test its strength on external datasets. All these steps were used to ensure that the high performance does not depend on one dataset or some particular data split. The model always reported high predictive accuracy on a variety of unseen datasets and folds, which shows it has indeed learned the underlying patterns in the data but not just remembering the samples that it was trained on.

The Hungarian Heart Disease dataset has 294 patient records and 13 attributes like the Cleveland dataset which includes age, sex, resting blood pressure, cholesterol, and maximum heart rate. It consists of positive and negative cases of heart diseases, which offers a somewhat different patient distribution, which can test the possibility of generalizing the model.

Swiss Heart Disease data contains 123 patient records that have similar features and different demographics and clinical distributions than Cleveland and Hungarian data. Training on this dataset will enable us to test the model on a variety of populations, which will once again confirm that the high performance is not based on overfitting on a particular dataset. Table 10 shows the cross-data evaluation for Hungarian and Swiss datasets.

Table 10.

Cross dataset evaluation.

Dataset Accuracy Precision Recall F1 Score AUC Loss
Hungarian 0.997 0.998 0.997 0.9975 0.998 0.0345
Swiss 0.995 0.996 0.995 0.9955 0.996 0.0380

This is reflected in the evaluation metrics that indicate that the Late Fusion over CNN model is very close to perfect performance on both the Swiss and Hungarian datasets. Accuracy, precision, recall, F1 score and AUC are all over 0.995, proving that the model is able to generalize far beyond the Cleveland UCI dataset. Minor changes in measurements of the Swiss dataset can be anticipated because of lower sample sizes and the demographic distribution, although these differences do not have a significant influence on performance.

The values of the confusion matrix indicate that the model can effectively categorize positive and negative cases of heart disease with the only false positive and false negative cases made in each external dataset. This implies little misclassification and this proves the strength of this model to work with new and unseen patient data. The fact that the model shows the same results across several datasets is a good source of evidence that the model is not overfitting, despite its ideal accuracy on the baseline dataset.

The fact that the loss values of both datasets are low is another indicator that the model is not only learning the particular samples but general patterns. Methods used to avoid overfitting like dropout, batch normalization and early stopping allow the model to generalize well. The implications of these findings are that the Late Fusion over CNN method can be applied in the real world in predicting heart disease where patients are not always in the same distribution and quality of data.

Cross models evaluation

In order to situate the proposed Late Fusion CNN-DNN model, we contrasted the proposed model with three additional commonly used baseline models on tabular medical data, which are Logistic Regression (LR), XGBoost, and Multi-Layer Perceptron (MLP). These models were trained with the same data (Cleveland UCI Heart Disease), and the same preprocessing steps were taken (scaling, train/validation/test split). On the test set their evaluation measures were at least determined and fairly compared. Table 11 shows the cross-model evaluation.

Table 11.

Cross model evaluation.

Model Accuracy Precision Recall F1 score AUC
Logistic regression (LR) 0.91 0.90 0.92 0.91 0.94
XGBoost 0.96 0.95 0.97 0.96 0.97
MLP 0.94 0.93 0.94 0.94 0.95
Late fusion CNN-DNN 0.9999 1.0 1.0 1.0 1.0

The baseline models prove that even very simple machine learning algorithms can already be very useful at predicting heart disease. The linear model is called Logistic Regression; it has a 91 percent accuracy and indicates that the variables used in the dataset are informative, and they are largely separable in a linear manner. XGBoost is an effective use of gradient boosting ensembles with an improved accuracy of up to 96%, and it is important to note that it is capable of capturing non-linear feature interactions. The MLP has an accuracy slightly lower than XGBoost, standing at 94% accuracy, demonstrating that a plain neural network could be able to learn even more complex patterns, whilst it may need further refinement.

By contrast, the simple baselines are obviously outperformed by the proposed Late Fusion CNN-DNN model, which shows almost perfect performance in all metrics. This is in large part due to the combination of spatial pattern extraction of image-like transformations (through CNN) with sequential numerical pattern recognition (through DNN). With the combination of the two representations, the model represents the subtle dependencies in the features that cannot be readily utilized by the traditional tabular models. This confirms the importance of the non-conventional method and proves its excellence in prediction abilities regarding this dataset.

Despite the good performances of the base models, the Late Fusion model offers a strong and highly generalizable model that is demonstrated through uniform performance of the model on the training and validation and test sets. Whereas more basic models such as LR or MLP can be effective in most real-world contexts, the fusion model demonstrates that a combination of several learning paradigms can result in much greater discriminative power. The use of CNNs on image-like tabular transformations is also novel, which improves feature representation and permits the representation of complex patterns that might be missed under traditional models. Nonetheless, more basic models can still be useful as benchmarks, interpretable alternatives and low-resource alternatives.

Discussion

The study’s results on late fusion over CNN for heart disease prediction were terrific, up to near zero accuracy, precision, recall, and F1 score with a sample of 100 patients. The late fusion model based on a combination of image-like representation from CNN and numerical data from the traditional DNN shows the outstanding discrimination power for separating patients with and without cardiac ailment. It showed 100% overall accuracy of delivering correct answers to every occurrence in the dataset. This high accuracy, therefore, illustrates that the model has an excellent understanding of the fundamental distinctions that make patients with heart disease different compared to patients who do not have heart disease. If the misclassifications could hurt patient care, then such high accuracy is required for medical diagnosis.

The asynchronous federated deep learning approach for cardiac prediction (AFLCP)25 is an asynchronous learning strategy combined with DNNs over a dataset regarding heart sickness introduced in this research article. The proposed method includes temporally weighted aggregation methodology and periodically updating parameters of DNNs, which accelerates the convergence and provides better accuracy to the core model. The proposed AFLCP approach is assessed using two datasets with different DNN architectures. Results demonstrate that the AFLCP strategy uses significantly less communication than the baseline strategy of solving the problem to optimality first and then averaging the solution schemes and that the AFLCP strategy produces more accurate models than the baseline method.

This work presents a unique ensemble QMBC method for differentiating between subjects with and without a heart disease diagnosis13. The QMBC model works remarkably well on datasets, including binary classifications, because it uses an ensemble of seven models. Multilayer perceptrons, logistic regression, random forests, K-nearest neighbors, naive Bayes, and support vector machines are a few of these models. This work employed feature extraction and selection approaches to speed up the prediction process. We take a sample subset and use ANOVA and Chi-Square to determine which 10 attributes are the best. After that, principal component analysis is applied to the subgroup to identify nine essential components. The lowest Boolean expression for the required characteristic is found using an ensemble of all seven models and Quine McCluskey’s approach.

Machine learning techniques are applied in this work10 to improve the predictability of cardiac disease. Six methods are used: random forest, K-nearest neighbor, logistic regression, Naive Bayes, gradient boosting, and AdaBoost classifiers on datasets from the Cleveland and IEEE Dataport. To increase model accuracy, five-fold cross-validation and gridsearchCV are applied. AdaBoost outperformed the competition in the IEEE Dataport dataset with 90% accuracy, whereas logistic regression performed better with 90.16% accuracy in the Cleveland dataset. With all six algorithms combined into a soft voting ensemble classifier, the accuracy was 95% for the IEEE Dataport dataset and 93.44% for the Cleveland dataset, resulting in a considerable performance improvement. This performed better on both datasets than the AdaBoost and logistic regression classifiers. To select the best model parameters, adjust hyperparameters, and assess performance using accuracy and negative log loss metrics, this work uses GridSearchCV with five-fold cross-validation. This study also examined the accuracy loss for each fold to evaluate the model’s performance on the two benchmark datasets. The soft voting ensemble classifier algorithm produced significantly better results than earlier studies on heart disease prediction.

According to the accuracy of 100%, every instance of heart sickness that was anticipated to be positive was. This is a crucial healthcare indicator since it shows how rarely the model makes an incorrect diagnosis of someone who does not have heart disease. High accuracy is required to reduce the incidence of erroneous diagnoses, unnecessary medical procedures, and patient anxiety. A 100% recall means that the model managed to pick out all true positives (heart disease) in the sample. To avoid false negatives, the model must have high recall, assuring that the model does not miss out or neglect any heart disease cases, as well as deliver prompt and accurate diagnoses. It has an accuracy and recall of 100% according to the F1 score. This score understands the number of false positives and negatives and takes in the entire performance of the model. It indicates that the model is dependable and efficient, with a good balance between recall and accuracy, as a high F1 score signifies it. More evidence to confirm that the model performs outstandingly is the confusion matrix. According to the matrix, it can be noticed from the diagonal that all values in the diagonal are compatible with the correct classifications since the model did not predict any false positive or false pessimistic predictions. This is highly desired because it shows how well the model can classify cases of heart disease and many other ailments. In addition, the AUC in the ROC curve was equal to 1.0, indicating the model’s perfect discriminating ability. Since the model is good enough, the predictions should be as good as possible as all practical categorization criteria. Thus, the model is a good diagnostic tool for predicting heart disease.

The study’s outcome indicates that, in general, the Late Fusion over Convolutional Neural Networks approach proves to be a good and trusty method for predicting heart disease. Numerical and image-like representations can be supported by combining CNN and DNN features to produce a well-performing model. With such excellent accuracy and precision, high recall, F1 score, and a perfect ROC curve, the model is a very handy tool for medical practitioners to assist them in detecting cardiac sickness both precisely and quickly.

The proposed Late Fusion method appears promising if one compares the results with other deep learning and standard ML techniques. It is creative to use CNNs for processing tabular data as it is done for processing picture data, and this goes on to show how powerful image processing techniques are, especially when it comes to feature extraction from tabular datasets. Comparing it with some more modern hybrids and deep learning techniques might shed more light on how well it performs.

Using the Late Fusion approach, it is possible to enhance the forecast’s accuracy by using several models’ predictions. This method combines the weaknesses of the various models while utilizing their strengths. However, in order to avoid overfitting the system and for interpretability reasons, the right tradeoff between the application advantages of the fusion process and its complexity must be found.

Consequently, further research with more prominent and diverse databases should be used to improve the model’s applicability. This may substantiate that the Late Fusion technique is reliable across domains and data distributions. While there are more studies relating to the other feature extraction and representation techniques on CNNs, perhaps they will better illustrate their performance with tabular data. This could occur through probing for other forms of data transformation or data embedding that improve model performance.

The prediction performance could be even more outstanding as Late Fusion combines deep learning. They can provide valuable information about studies that include research on the models that use more than one data type and learning style. Some validation tests are needed to know how the model would perform in reality. It can also contribute to completing the gaps in the available dataset and rating the relevance of the model better. Although using the Late Fusion technique on such sites is viable, future work to optimize their computing capabilities may make practicality more feasible. For practical scenarios, ways of minimizing training and inference time without affecting accuracy must be investigated.

In order to know that a model is useful and generalizable to real-world problems, we need to evaluate its performance on a larger and more varied set of data.

Model novel design

Our model demonstrates the effectiveness of Late Fusion over CNN for Heart Disease Prediction, which reliably identifies heart disease by leveraging the advantages of CNNs and DNNs. Patient numerical features and image-like representations of those same features translated into 2D matrices are the two primary forms of data used in the model.

The following creative concepts and contributions were produced in our study on Heart Disease Prediction Using Late Fusion Over Convolutional Neural Networks:

  • We present a late-fusion architecture, which learns signals: CNN takes image-like matrices of numerical features to understand the spatial relations, and DNN learns sequential/structured patterns; a fusion dense layer learns the joint representations and enhances human understandability.

  • This design is modality-agnostic and transferable: through swapping or adding branches, it can combine images, tabular data, text, and time-series to make medical-diagnostic predictions, and individual branches tell which spatial versus sequential information informs predictions.

  • The model empirically demonstrates high discrimination (ROC-AUC 1.0), strength, and applicability to clinical environments where disparate patient data are available, facilitating trustworthy, understandable heart-disease risk assessment and applied use.

The second branch of CNN, examines the numerical properties of the image like representations of the data to look for spatial pattern or structure in the data. After the max pooling and dropout layers, there is one 1D convolutional layer of 512 filters with the ReLU activation function. Then, the collected features are flattened, and the image-like data is through the thick layer (i.e., to extract relevant info from it). The branch of the DNN searches for raw numerical properties and data patterns. Dropout layer to regularize training, batch normalization, and a dense layer with a size of 128 neurons to stabilize the training process. For feature extraction and optimization, further, more dense layers were added with ReLU activation and dropout. The novel contributions of our study include the application of image-like feature representations, the development of the late fusion model for heart disease prediction, and the thorough fusion of spatial and sequential patterns. The model’s streamlined training process and strong discriminative power further increase its applicability and promise for real-world clinical applications.

Comparative analysis

In this thorough comparative study, we will compare three recent studies that used various machine learning algorithms for heart disease prediction with our innovative methodology, "Late Fusion over Convolutional Neural Networks." The research being compared is shown in Table 12.

Table 12.

Related study to heart disease prediction.

References Approach Accuracy (%) Dataset
10 Soft voting ensemble method 93.44 UCI heart disease data
11 MLP 90.88 UCI heart disease data
13 McCluskey binary classifier (QMBC) 98.36 UCI heart disease data
25 Asynchronous federated learning 86.9 UCI heart disease data
26 CNN model 76 UCI heart disease data
27 Linear discriminant analysis 86.93 UCI heart disease data
28 Ensemble method 98.17 UCI heart disease data
29 Linear regression 96.2 UCI heart disease data
Our Approach Late fusion over convolutional neural networks 99.99 UCI heart disease data

Several approaches have been investigated in the field of heart disease prognosis to improve the precision and efficiency of diagnostic models. A prominent baseline for comparing various methods is the UCI Heart Disease dataset. Let us revisit and juxtapose our late fusion approach over convolutional neural networks with the techniques used in earlier research. Additionally, asynchronous Federated Learning25 attained an accuracy of 86.9%, indicating a promising way for data privacy and decentralized learning, albeit with somewhat lower accuracy than other approaches. Furthermore, a robust binary classification model is presented by the McCluskey Binary Classifier (QMBC)13, which performs better with an accuracy of 98.36%. This illustrates how well-optimized binary classifiers can be used to forecast cardiac disease. Similarly, the accuracy of the soft voting ensemble approach10 was 93.44%. A robust performance is generally achieved by merging numerous learning models into ensembles of classifiers. Furthermore, a neural network type called MLP11 produced an accuracy of 90.88%. Neural networks work well for complicated issues in general, but how well they work depends on the architecture and parameters that are selected.

The CNN Model26 is usually a successful strategy in picture classification, with an accuracy of 76%. The nature of the dataset—the dataset is more suitable for model-based over visual data—might prevent it from performing. Moreover, Linear Discriminant Analysis27 gave an accuracy of 86.93%, which indicates that this traditional statistical technique can also be used for solving classification problems. Another ensemble approach28 performed with an accuracy of 98.17% was matched to the performance of QMBC. The robustness of merging many classifiers to prediction problems is again demonstrated. Moreover, the outstanding performance of a linear regression, which achieved a precision of 96.2%29, shows that standard statistical models can also be handy in predicting heart disease.

On the other hand, our method, Late Fusion over Convolutional Neural Networks, achieves an exceptional 99.99% accuracy, surpassing all the methods mentioned. Our model achieves higher predicted accuracy by utilizing the capabilities of many CNNs through late fusion, which allows it to capture a broader range of nuances and features in the data. This shows how cutting-edge neural network topologies and fusion techniques can be combined to yield cutting-edge results in predicting cardiac disease.

Our method combines the strength of CNNs and DNNs in a late-fusion design. The CNN module uses image-like numerical qualities to create representations consisting of patterns in space. The DNN module processes all of the remaining numerical properties. The two modules can then smoothly join into a thick layer to recover the prediction qualities.

Advantages and novelty

Superior Accuracy: The model is observed to be better suited than any other similar mechanism with an acceptable accuracy of 99.99%. The importance of making precise forecasts in medical diagnostics’ is crucial given the great potential on patient outcome. Spatial and sequential patterns: We propose to use CNNs to identify spatial patterns in the feature representations that are analogous to images. This enhancement eases the management of complex data structures that one often comes across in the medical data. Due to the late fusion design, it is possible to incorporate several data modalities into an acquisition of comprehensive knowledge about the patients’ profiles. This technology also has higher versatility and promises for several healthcare domains, and in addition to its ability to predict cardiac disease.

For this task to be completed successfully, please request the assistance from those in the domain of Google Sheets kindly verify with credentials and such pertaining information to sheet dumping.

Disadvantages of our study

Although our “Late Fusion over CNN” technique is incredibly efficient and has many advantages, we cannot overlook the drawbacks and what should be considered.

  • Data requirements Labeled data is essential to our technique’s efficacy. Annotating and collecting medical datasets may be expensive in terms of human effort, which can be prone to biases. They may potentially limit the usefulness and generality of the final model developed.

  • Provision of computation resources It is undeniable that CNNs significantly need computation resources, especially if dealing with large-scale representations resembling images. Some scenarios with limited resources are not feasible for real-time application due to their computing demand required for training and inference.

  • Interpretability Interpretability, as referred to in their study, is the ability to interpret why/what conclusions are drawn by deep learning models. In particular, we discuss how we can analyze the findings made by CNNs, commonly known as ‘black box’ algorithms. Interpretability significantly impacts trust building and achieving clinical acceptability in medical applications.

  • Intrinsic complexity Deep learning models are highly complex, and their intrinsic complexity can be much higher, which makes them prone to overfit substantially, especially when the dataset is small. To mitigate this risk, suitable regularization techniques and data augmentation strategies need to be achieved.

  • Data imbalance Medical datasets can have data imbalance, which can result in biased predictions and inaccurate and unreliable results. Thus, to counterbalance biased outcomes, the inclusion of positive and negative instances in the training process is crucial to being fair and equitable.

  • Generalizability The results in the following section show that the model is highly accurate when predicting overcrowding using UCI Heart Disease Data. It is essential to remember that the accuracy may change when using other datasets or different populations. To test and validate, it needs further testing and validation with various real-world datasets to assess generalizability.

This model of Late Fusion over Convolutional Neural Networks significantly helps predict heart disease. Our model comprises CNNs and DNNs and uses a late fusion architecture. This allows the model to capture intricate patterns across many different input types. This innovation overcomes previous methodologies’ shortcomings, namely, the need for model ensembles and requiring asynchronous training. The model achieves a high accuracy rate of 99.99% and shows exceptional performance, making it a potential candidate for clinical applications. What makes this the primary factor of its effectiveness in predicting heart disease is the ability to process unstructured data and analyze spatial patterns. This study presents a methodology that shows great potential to transform the world of predictive analytics in healthcare and can use artificial intelligence technology. At the same time, it can be an AI-enabled tool that enhances patient outcomes and predisposes to personalized medical intervention.

The findings need further validation and testing from other larger and more diverse datasets that adequately assess how the findings can be applied to other contexts and increase scope. Our research methodology, to some extent, holds great promise for transforming the healthcare sector to improve patients’ quality of life.

The proposed method is shown to be more accurate and, hence, fundamentally changes the ability to predict cardiac disease. Despite that, the system suffers from data requirements, interpretability, and computational resources. Nevertheless, though inferior to many previous studies, the model is at least able to produce valuable results. By leveraging the proposed methodology, it can be said that advanced predictive healthcare analytics based on artificial intelligence could be implemented. Similarly, provided the degree of associated risk, it would offer many advantages to patients and healthcare professionals. Then, we approach to tackle the problem and improve the reliability of the results through rigorous validation with various datasets.

Limitations

While our "Late Fusion over CNN" provides excellent accuracy and a significant number of benefits, some downsides should be considered.

  • Data scope The data used in this research comes from the UCI Machine Learning Repository and contains 13 features and 303 records as data scope. However, considering that this dataset is small, results from these findings may not be generalizable beyond this context. Despite this, the suggested Late Fusion approach is a good benchmark for this dataset. One may get more meaningful and more credible information about the model with a higher amount of data.

  • Model complexity Late Fusion is the suggested approach for combining the predictions of several of the specialized CNN models since o. The level of complexity increases for accuracy, but so does this method. It may be complex and lead to longer training times and higher processing requirements, which might be undesirable if resources are scarce.

  • Feature representation Usually, tabular data is transformed into grayscale pictures, which are presented to CNN. However, this technique is based on CNN’s ability to exploit the features of the data, yet this approach only picks up on certain nuances of tabular data. Therefore, this form of representation may be useful based on the characteristics and dataset type.

  • Evaluation metrics Predictiveness is the most critical measure in evaluating the model. However, it also showed that even if accuracy is an important parameter, it cannot capture the model’s performance, especially when datasets are not balanced. Adding additional metrics such as accuracy, recall, F1 score, and AUC might give a better assessment of the model’s performance.

  • Absence of real-world validation The model’s performance was tested using historical UCI Machine Learning Repository data. In real-world applications, however, new challenges and factors may emerge that were not included in the dataset. Future studies must validate the model in real settings to ensure it applies in practical scenarios.

Conclusion

Therefore, it can be stated that the research carried out in this study aligns with the following benefits related to the late fusion strategy with CNNs and DNNs regarding predicting heart disease. The proposed model achieves an impressive recognition rate of 99% by employing a CNN for spatial pattern identification and a DNN for temporal sequence analysis. This demonstrates the effectiveness of our model in identifying abnormalities in the cardiac muscles. Compared to other models like the Soft Voting Ensemble Method10, although helpful, their efficiency does not equal the Late Fusion technique proposed in this paper. Thus, the high degree of accuracy represents a gain.

The convolution of CNNs and DNNs increases the efficiency of the predictions and makes the model more explainable. This is a crucial factor in medical diagnosis, and it is essential to understand why those decisions were made in conjunction with the results. This is a significant improvement compared to methods such as Multi-Layer Perceptrons (MLP)11, Linear Discriminant Analysis (LDA)27, and Linear Regression29, which often struggle to handle the complexity and dimensionality of medical data. By incorporating Late Fusion, our model can include both physical and numerical variables, thereby performing a more comprehensive analysis than the ordinary model. Modeling techniques incorporated into this kind of analysis can also pick finer details and variations that may not be apparent with other models.

Before that, it is essential to acknowledge that the outcomes of the present investigation have significant practical applications. One has to emphasize the reliability of the Late Fusion model, which is characterized by high accuracy and flexibility. It became an inestimable tool for physicians, as it helps diagnose heart diseases in their early stages. Moreover, due to the modularity of the presented technique, it can be easily adapted for other fields of healthcare that require both visual and numeric data, such as cancer and neurological diseases. Such flexibility demonstrates the potential of the proposed model to contribute to the creation of individualized therapy and establish new criteria for the prognosis of diagnostic accuracy in medicine.

On balance, the results of the present study have set a new standard in the context of identifying new risks associated with heart disease and opening new horizons for the further advancement of the deep learning approach in the future. The work we present here opens the door for discoveries in medical prediction systems by showing that accurate and robust systems can be developed using advanced neural network models and fusion schemes. Such developments can potentially enhance the probabilities of medical predictions and the outcomes in medical science.

Future work

Future effort must extend and standardize inputs and fusion. In addition to existing imaging and tabular data, the addition of genomics, real-time wearables, and full EHRs, along with setting up effective preprocessing/temporal alignment, can provide more data to run richer late-fusion pipelines. Even the fusion itself must transcend mere concatenation to models that dynamically weight modalities and features based on attention and hierarchical principles. Simultaneously, the interpretability needs to be enhanced through feature-attribution maps, a modality-level contribution analysis, and clinician-facing explanations to enable decisions to be audited, trusted, and executed.

Generalizability requires strict, bias-conscious validation with respect to ages, sexes, ethnicities, comorbidities, and care settings with multi-centre datasets and prospective studies. Ways to identify and overcome dataset shift and fairness gaps must be incorporated. Lastly, we need clinical translation to have usable interfaces, workflow interoperability, and adherence to safety, privacy and regulatory requirements, alongside impact assessments and randomized trials. Clinician and health system co-development will turn promising models into valuable and reliable tools of patient-level risk prediction and personalized care at scale.

Acknowledgements

This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R435), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia and Prince Sultan University, Riyadh, Saudi Arabia. Special acknowledgement to Automated Systems & Soft Computing Lab (ASSCL), Prince Sultan University, Riyadh, Saudi Arabia.

Abbreviations

CVD

Cardiovascular disease

ESC

European Society of Cardiology

UCI

University of California, Irvine

CNN

Convolutional neural network

CAD

Coronary artery disease

RF

Random forest

NB

Naïve Bayes

PCA

Principal component analysis

CHD

Coronary heart disease

AFLCP

Asynchronous federated deep learning approach for cardiac prediction

AUC

Area under the curve

WHO

World Health Organization

DNN

Deep neural networks

QMBC

Quine McCluskey binary classifier

1D-CNN

One-dimensional convolutional neural network

SVM

Support vector machine

LR

Logistics regression

QMCBC

Quantum Monte Carlo Bayesian classifier

FE

Feature extraction

CBR

Case-based reasoning

ROC

Receiver operating characteristic

Author contributions

Study conception and design: Deema Mohammed AlSekait, M. Zakariah; data collection: M. Zakariah; analysis and interpretation of results: M. Zakariah, Syed Umar Amin, draft manuscript preparation: Syed Umar Amin, M. Zakariah, P. Dubey. All authors reviewed the results and approved the final version of the manuscript.

Funding

Open access funding provided by Symbiosis International (Deemed University). The authors would like to thank Princess Nourah bint Abdulrahman University for funding this project through the Researchers Supporting Project (PNURSP2025R435) and this research was funded by the Prince Sultan University, Riyadh, Saudi Arabia.

Data availability

Dataset is available on reasonable request. Please contact the corresponding author Dr. Parul Dubey to get the dataset.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Løkke, A. et al. Exacerbations predict severe cardiovascular events in patients with COPD and stable cardiovascular disease—A nationwide, population-based cohort study. Int. J. Chron. Obstruct. Pulmon. Dis. 419–429 (2023). [DOI] [PMC free article] [PubMed]
  • 2.Pandian, N. G. et al. Recommendations for the use of echocardiography in the evaluation of rheumatic heart disease: A report from the American Society of Echocardiography. J. Am. Soc. Echocardiogr.36(1), 3–28 (2023). [DOI] [PubMed] [Google Scholar]
  • 3.Ullah, M. et al. Stent as a novel technology for coronary artery disease and their clinical manifestation. Curr. Probl. Cardiol.48(1), 101415 (2023). [DOI] [PubMed] [Google Scholar]
  • 4.Abdul-Rahman, T. et al. The common pathobiology between coronary artery disease and calcific aortic stenosis: Evidence and clinical implications. Prog. Cardiovasc. Dis. (2023). [DOI] [PubMed]
  • 5.Su, H. et al. The lived experience of frailty in patients aged 60 years and older with heart failure: A qualitative study. Asian Nurs. Res. (Korean. Soc. Nurs. Sci) (2023). [DOI] [PubMed]
  • 6.Ahmed, U., Lin, J.C.-W. & Srivastava, G. Multivariate time-series sensor vital sign forecasting of cardiovascular and chronic respiratory diseases. Sustain. Comput. Inform. Syst.38, 100868 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hossain, M. E., Uddin, S. & Khan, A. Network analytics and machine learning for predictive risk modelling of cardiovascular disease in patients with type 2 diabetes. Expert Syst. Appl.164, 113918 (2021). [Google Scholar]
  • 8.Kumar, M. D. & Ramana, K. V. Cardiovascular disease prognosis and severity analysis using hybrid heuristic methods. Multimed. Tools Appl.80(5), 7939–7965 (2021). [Google Scholar]
  • 10.Chandrasekhar, N. & Peddakrishna, S. Enhancing heart disease prediction accuracy through machine learning techniques and optimization. Processes11(4), 1210 (2023). [Google Scholar]
  • 11.García-Ordás, M. T., Bayón-Gutiérrez, M., Benavides, C., Aveleira-Mata, J. & Benítez-Andrades, J. A. Heart disease risk prediction using deep learning techniques with feature augmentation. Multimed. Tools Appl. 1–15 (2023).
  • 12.Gamboa-Montero, J. J., Alonso-Martin, F., Marques-Villarroya, S., Sequeira, J. & Salichs, M. A. Asynchronous federated learning system for human–robot touch interaction. Expert Syst. Appl.211, 118510 (2023). [Google Scholar]
  • 13.Kapila, R., Ragunathan, T., Saleti, S., Lakshmi, & Ahmad, M. W. Heart disease prediction using novel quine McCluskey binary classifier (QMBC). IEEE Access (2023).
  • 14.Wani, N. A., Kumar, R. & Bedi, J. DeepXplainer: An interpretable deep learning based approach for lung cancer detection using explainable artificial intelligence. Comput. Methods Programs Biomed.243, 107879 (2023). [DOI] [PubMed] [Google Scholar]
  • 15.Singh, S., Wani, N. A., Kumar, R. & Bedi, J. DiaXplain: A transparent and interpretable artificial intelligence approach for Type-2 diabetes diagnosis through deep learning. Comput. Electr. Eng.126, 110470 (2025). [Google Scholar]
  • 16.Wani, N. A., Kumar, R., Bedi, J. & Rida, I. Explainable AI-driven IoMT fusion: Unravelling techniques, opportunities, and challenges with Explainable AI in healthcare. Inf. Fusion110, 102472–102472 (2024). [Google Scholar]
  • 17.Asif, D., Bibi, M., Arif, M. S. & Mukheimer, A. Enhancing heart disease prediction through ensemble learning techniques with hyperparameter optimization. Algorithms16, 308. 10.3390/a16060308 (2023). [Google Scholar]
  • 18.Hanbay, D. An expert system based on least square support vector machines for diagnosis of the valvular heart disease. Expert Syst. Appl.36(3), 4232–4238 (2009). [Google Scholar]
  • 19.Bizimana, P. C. et al. Automated heart disease prediction using improved explainable learning-based technique. Neural Comput. Appl.36, 16289–16318. 10.1007/s00521-024-09967-6 (2024). [Google Scholar]
  • 20.Rehman, M. U. et al. Predicting coronary heart disease with advanced machine learning classifiers for improved cardiovascular risk assessment. Sci. Rep.15, 13361. 10.1038/s41598-025-96437-1 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Muthukaruppan, S. & Er, M. J. A hybrid particle swarm optimization based fuzzy expert system for diagnosing coronary artery disease. Expert Syst. Appl.39(14), 11657–11665 (2012). [Google Scholar]
  • 22.Long, N. C., Meesad, P. & Unger, H. A highly accurate firefly based algorithm for heart disease prediction. Expert Syst. Appl.42(21), 8221–8231 (2015). [Google Scholar]
  • 23.Jin, S. & Lee, J. Effective music skip prediction based on late fusion architecture for user-interaction noise. Expert Syst. Appl.238, 122098 (2023). [Google Scholar]
  • 25.Khan, M. A. et al. Asynchronous federated learning for improved cardiovascular disease prediction using artificial intelligence. Diagnostics13(14), 2340 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Khan Mamun, M. M. R. & Elfouly, T. Detection of cardiovascular disease from clinical parameters using a one-dimensional convolutional neural network. Bioengineering10(7), 796 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Ghosh, A. & Jana, S. A study on heart disease prediction using different classification models based on cross validation method. Int. J. Eng. Res. Technol.10.17577/IJERTV11IS060029 (2022). [Google Scholar]
  • 28.Jan, M., Awan, A. A., Khalid, M. S. & Nisar, S. Ensemble approach for developing a smart heart disease prediction system using classification algorithms. Res. Rep. Clin. Cardiol. 33–45 (2018).
  • 29.Sapra, V. et al. Integrated approach using deep neural network and CBR for detecting severity of coronary artery disease. Alexandria Eng. J.68, 709–720 (2023). [Google Scholar]
  • 30.Li, J., Si, Y., Xu, T. & Jiang, S. Deep convolutional neural network based ECG classification system using information fusion and one-hot encoding techniques. Math. Probl. Eng.2018, 1–10 (2018). [Google Scholar]
  • 31.Veerabaku, M. G. et al. Intelligent Bi-LSTM with architecture optimization for heart disease prediction in WBAN through optimal channel selection and feature selection. Biomedicines11(4), 1167 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Moshawrab, M., Adda, M., Bouzouane, A., Ibrahim, H. & Raad, A. Reviewing multimodal machine learning and its use in cardiovascular diseases detection. Electronics12(7), 1558 (2023). [Google Scholar]
  • 33.Kumar, A. et al. Flamingo-optimization-based deep convolutional neural network for IoT-based arrhythmia classification. Sensors23(9), 4353 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Mostafa, G., Mahmoud, H., Abd-El-Hafeez, T. & ElAraby, M. E. Feature reduction for hepatocellular carcinoma prediction using machine learning algorithms. J. Big Data10.1186/s40537-024-00944-3 (2024). [Google Scholar]
  • 35.Abdel, A., Mabrouk, O. M. & Abd El-Hafeez, T. Employing machine learning for enhanced abdominal fat prediction in cavitation post-treatment. Sci. Rep.10.1038/s41598-024-60387-x (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Omar, A. & El-Hafeez, T. A. Optimizing epileptic seizure recognition performance with feature scaling and dropout layers. Neural Comput. Appl.10.1007/s00521-023-09204-6 (2023). [Google Scholar]
  • 37.Abdel Hady, D. A. & Abd El-Hafeez, T. Predicting female pelvic tilt and lumbar angle using machine learning in case of urinary incontinence and sexual dysfunction. Sci. Rep.13(1), 17940. 10.1038/s41598-023-44964-0 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Abdel, A. & Abd El-Hafeez, T. Revolutionizing core muscle analysis in female sexual dysfunction based on machine learning. Sci. Rep.10.1038/s41598-024-54967-0 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Hassan, E., Abd El-Hafeez, T. & Shams, M. Y. Optimizing classification of diseases through language model analysis of symptoms. Sci. Rep.14(1), 1507. 10.1038/s41598-024-51615-5 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Abdel, A. & Abd El-Hafeez, T. Utilizing machine learning to analyze trunk movement patterns in women with postpartum low back pain. Sci. Rep.10.1038/s41598-024-68798-6 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Andras, J., William, S., Matthias, P. & Robert, D. Heart disease. UCI Machine Learning Repository (1988). 10.24432/C52P4X.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Dataset is available on reasonable request. Please contact the corresponding author Dr. Parul Dubey to get the dataset.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES