Abstract
Coronary Computed Tomography Angiography (CCTA) is a non-invasive imaging technique used to visualize the coronary arteries and diagnose coronary artery disease. It provides detailed 3D images which helps in identifying blockages, stenosis and plaques. However, CCTA images often suffer from noise, low contrast, and motion artifacts, which complicate accurate segmentation and analysis. The proposed work addresses the challenges in detecting and classifying coronary artery disease (CAD) from CCTA images. For accurate coronary artery segmentation, Region-Guided Mask R-CNN (RG-Mask R-CNN) combines region growing for initial region identification with Mask R-CNN for precise instance segmentation. This hybrid approach reduces the complexity of the structures in CCTA images and provides better segmentation accuracy. For feature extraction, Haralick-ResNet Fusion (HRF) captures the texture information using Haralick features and ResNet for high-level feature extraction to handle the difficulty of subtle variation differentiation in CCTA images. Finally, a Deep Convolutional Network (DeepConvNet) combines deep learning and convolutional layers to classify CAD efficiently based on the features extracted. This integrated framework offers an accurate solution (98.3%) for CAD detection from CCTA images by overcoming common issues such as noise and complex arterial structures, while improving diagnostic performance.
Keywords: Coronary artery, Coronary computed tomography angiography, Region-guided mask R-CNN, Haralick-ResNet fusion, Deep convolutional network, Medical image segmentation
Subject terms: Cardiology, Computational biology and bioinformatics, Engineering, Health care, Mathematics and computing
Introduction
Coronary Computed Tomography Angiography is a high-resolution imaging technique that allows for the generation of detailed 3D images of the coronary arteries, making it possible to evaluate the existence of CAD. CCTA has dramatically changed the way CAD is diagnosed because it allows for non-invasive visualization of the coronary arteries in detail. With the advancement in CT imaging technology, CCTA has become more popular because it can detect blockages, stenosis, and plaques in the coronary arteries. These detailed visualizations assist clinicians in determining the severity of coronary artery disease and guiding therapeutic interventions, which, in turn, guides therapeutic interventions1. However, images from CCTA also have unique challenges. CCTA images often exhibit artifacts such as motion blur, noise, and low contrast, especially in regions around small blood vessels or in patients with high BMI. Moreover, the structures may overlap in the CCTA images, and their smaller coronary vessels can easily be confused with the neighbouring tissues. These factors make the CCTA image analysis and segmentation, highly essential in the process of CAD detection, more difficult2.
Despite CCTA’s strong utility as a diagnostic tool, the images obtained are often quite challenging to interpret. Noise remains one of the primary issues seen in CCTA images, as it can stem from limitations within the scanner itself, patient movement, and signal-to-noise ratios that may be low. This noise significantly limits the clear delineation of coronary artery boundaries and other structures. Furthermore, CCTA images have less contrast between the coronary arteries and the surrounding tissues, especially in regions where the arteries are tiny or have complicated shapes. Such low contrast boundaries can blur and might not be included during the segmentation process, which is one of the most important procedures for diagnosing CAD. In addition, the coronary arteries often overlap with other structures including veins and connective tissues complicating the segmentation process. Motion artifacts, caused by patient movement during the scan, distort the image, Calcified plaques are often present in CAD and add to the difficulty of segmentation, as these plaques are difficult to distinguish from surrounding tissues. Therefore, advanced segmentation and feature extraction techniques are required to effectively analyse such complex CCTA images3,4.
The traditional approaches use methods such as thresholding, region growing, or edge detection to segment CCTA images. Thresholding methods fail to accurately identify complex structures like small coronary arteries because of low contrast and only segment an image based on pixel intensity values. Region Growing selects seed points and expands regions based on intensity similarity. This approach is sensitive to noise and artifacts, making it less reliable for CCTA images. Edge Detection, based on gradient calculations, can be effective for detecting boundaries but often fails to capture the full context of coronary arteries, especially in the presence of overlapping structures or noise5.
The feature extraction methods of traditional approaches heavily rely on handcrafted features. One of them is the Haralick feature, which is obtained from a Gray-Level Co-occurrence Matrix (GLCM). Even though these texture-based features offer valuable information to classify patterns in medical images, they may not be enough to capture the complexity of variations found in CCTA images. Other methods, which include Scale-Invariant Feature Transform (SIFT) and Histogram of Oriented Gradients (HOG), are also being used but not always successful at capturing high-level, abstract patterns that are vital for accurate CAD diagnosis. Methods of classification involve Support Vector Machines (SVM) and K-Nearest Neighbours (KNN) and usually base their decisions on a combination of manually extracted features. However, these models fail to generalize across different cases, especially when dealing with the inherent complexity and variability of CCTA images leading to suboptimal performance6.
In addition, the current study proposes an integration of more advanced deep learning-based techniques with traditional methods for overcoming the traditional limitations. Segmentation is provided through Region-Guided Mask R-CNN (RG-Mask R-CNN), that aims to alleviate issues such as noise, low contrast, and complex structures seen in CCTA images. With Region Growing identifying possible areas of interest, followed by Mask R-CNN performing fine-grained instance segmentation to enable accurate detection of coronary arteries, the research has been initiated7. This hybrid approach enhances segmentation accuracy by integrating region-based initialization with deep learning for detailed boundary extraction. In feature extraction, the proposed HRF approach integrates Haralick features, which are texture-based features from GLCM, with ResNet, a deep residual network. Haralick features capture the most important texture information in the arteries and surrounding tissues, while ResNet extracts high-level deep features, overcoming the challenge of subtle variations in CCTA images. This fusion enhances the robustness and discriminative power of the feature extraction process. Finally, classification is performed by using a DeepConvNet to automatically learn features from the data8.
The conventional segmentation, feature extraction, and classification methods do not work well with the CCTA images that are complex and noisy. Manual feature engineering forms the basis of these methods, which is both time-consuming and may not catch the minute details required for effective CAD detection. In contrast to existing approaches based on the use of Mask R-CNN, the solution contains the use of region shown priors and attention devices to improve the coronary segmentation within noisy CCTA images. This region-guided Mask R-CNN, combined with feature extraction via Haralick-ResNet Fusion and classification with DeepConvNet gives the comprehensive solution towards handling all those challenges. Here, both for segmentation and for classification, by integrating deep learning-based techniques together, this work eliminates manual extraction of features. It offers end-to-end performance is effective and accurate in its working process9,10.
Main contribution of the proposed work
Region-Guided Mask R-CNN for the accurate segmentation of coronary arteries in CCTA images which addresses issues like noise, low contrast and complex structures.
A new fusion model that combines ResNet with the Haralick features for developing robust texture-based feature extraction.
The utilization of DeepConvNet for the automatic classification of CAD without manually extracting features can enhance the diagnostic accuracy.
Enhanced end-to-end approach for segmentation, feature extraction, and classification addressing the limitations of traditional methods.
It enhances the detection of coronary artery disease by effectively handling artifacts, overlapping structures and subtle variations in CCTA images.
The paper is structured in the following manner. Section II discusses the literature review on various CAD detection. Section III explains the proposed work which comprises of Region-Guided Mask R-CNN model for segmentation, Haralick-ResNet Fusion method for feature extraction. After that, the Deep Convolutional Network (DeepConvNet) is used for classification. Section IV discusses results of proposed system on a dataset of CCTA images by comparing its performance with traditional methods in terms of accuracy, precision, recall, and computational efficiency. Section V concludes the proposed work and directions for future work.
Literature review
Han et al. (2020) discussed deep learning techniques to assess coronary artery stenosis in CCTA images with the help of CNNs in detection and classification of the severity of stenosis. This model provided a high accuracy for CAD assessment. However, it would possibly require a large dataset and perform poorly on noisy images or smaller features11. Hu et al. (2024) analysed CAD risks for type 2 diabetes mellitus patients in association with scores on Coronary Artery Calcium (CAC). The approach, based solely on non-invasive CAC scores, enables an alternative in low-cost estimations but greatly relies on actual, reliable scores for CAC that may be less readily obtained at times12. Ihdayhid et al. (2024) introduce unsupervised deep learning for coronary artery stenosis and high-risk plaque detection on CCTA images using an automatic system without needing manual annotations; this system increases the diagnosis process but, based on few labels, could miss complex and very rare cases13.
Paul et al. (2022) assess a deep learning model for automatic stenosis detection from coronary CT angiography. The authors applied CNN-based models to detect stenosis, which enables faster and more accurate detection. However, it may face challenges with false positives or negative cases where stenosis is subtle14. Wu et al. (2025) emphasizes CCTA-based pericoronary adipose tissue imaging for the prediction of Major Adverse Cardiovascular Events (MACE). The model examines the fat tissue surrounding the coronary arteries and gives an early prognosis on accurate CCTA scans15. Chen et al. (2024) applied the approach in CCTA images using positive unlabelled learning. The method utilizes strong, abundant and free labelled data for segmentation. However, it may have errors when distinguishing between positive samples and unlabelled samples16.
Ainiwaer et al. (2024) evaluates the feasibility of using heart sound signals in obstructive coronary artery disease prediction based on deep learning techniques. This model is less expensive than any of the conventional imaging modalities on the quality of the heart sounds but cannot be a completely reliable17. Huang et al. (2023) proposed an automatic Coronary Artery Disease Reporting and Data System (CAD-RADS) based on CNNs developed for analysing coronary CT angiography. This approach standardizes reporting of CAD findings but is limited by the quality of CCTA scans and cannot hold against all of the different forms of CAD18. Dutta et al. (2020) proposes an efficient CNN for the prediction of coronary heart disease, which is focused on feature extraction and classification from CCTA images. The model offers an efficient solution for CAD prediction but may not generalize well across all patient demographics or scan qualities19.
Alizadehsani et al. (2016) propose computational intelligence methods for the detection of coronary artery disease, using various algorithms like neural networks. This approach uses multiple methods for better detection performance, but the methods can be computationally expensive and require large datasets for training20. Mienye & Jere (2024) utilizes optimized ensemble learning approach with techniques of explainable AI. Although the model enables the insight during the prediction phase, it remains less interpretable. However, ensemble methods require high computational expense21. Kusunose et al. (2020) develop a deep learning approach for automatic assessment of regional wall motion abnormalities in echocardiographic images. This will automate the process of wall motion analysis by improving efficiency, although it is bound to be confined to certain types of heart disease and not be generalizable across other cases22.
Saboor et al. (2022) presented machine learning algorithms that predict heart disease and improve the predictions. It offered a data-driven approach to predict diseases, but demand large, high-quality datasets for precise predictions23. El-Hasnony et al. (2022) explores multi-label active learning-based machine learning models. Active learning improves model efficiency by using fewer labeled samples, but multi-label problems can complicate training and result interpretation24. Haq et al. (2018) proposed a hybrid intelligent system employs multiple algorithms together to increase the hybrid systems tend to be complicated and difficult to implement25.
Kim & Kang (2017) propose a neural network-based method incorporates feature correlation for better risk assessment, although it may require thorough feature engineering to achieve optimal results26. Angraal et al. (2020) employs machine learning provides very useful prognostic information for the management of heart failure but may only be applied to a few selected patient populations27. Sharma et al. (2020) explores the use of artificial plant optimization algorithms to identify heart rate and the disease itself with machine learning. The approach offers a new optimization technique. Its performance depends on whether it can manage noisy data28. Mienye & Sun (2021) explores particle swarm optimization and stacked sparse auto encoders. The hybrid method improves the prediction accurateness but the complexity increases computational requirements29 (Table 1).
Table 1.
Literature review on existing methods.
| S.No | Author(s) et al. (Year) | Dataset | Methodology | Accuracy | Challenges |
|---|---|---|---|---|---|
| 13 | Hu et al. (2024) | Coronary artery calcium scores (CAC) | Deep learning-based type 2 diabetes using CNN | 86% | Relies on accurate CAC score data |
| 14 | Ihdayhid et al. (2024) | CCTA images | Unsupervised deep learning technique | 90% | Struggles with complex or rare cases |
| 17 | Chen et al. (2024) | CCTA images | Positive-unlabelled learning | 88% | May be prone to errors in distinguishing samples |
| 18 | Ainiwaer et al. (2024) | Heart-sound signals | Deep learning for predicting obstructive CAD | 85% | Dependent on signal quality, may not replace imaging |
| 22 | Mienye & Jere (2024) | Heart disease datasets | Optimized ensemble learning with explainable AI | 86% | Computationally intensive |
| 19 | Huang et al. (2023) | CCTA images | CNN-based CAD-RADS | 89% | Limited by CCTA scan quality, may not handle all CAD types |
| 15 | Wu et al. (2025) | CCTA-based pericoronary adipose tissue images | Predictive deep learning for major adverse cardiovascular events (MACE) using CNN | 91% | Clinical validation limited, depends on scan accuracy |
| 12 | Han et al. (2020) | CCTA images | CNN | 89% | Requires large datasets, sensitive to noisy images |
| 20 | Dutta et al. (2020) | CCTA images | Efficient CNN | 90% | Generalization to diverse demographics may be limited |
| 23 | Kusunose et al. (2020) | Echocardiographic images | CNN | 90% | Limited to specific heart conditions |
| 24 | Saboor et al. (2022) | Heart disease datasets | SVM with RF | 88% | Requires high-quality labeled data |
| 25 | El-Hasnony et al. (2022) | Heart disease datasets | Multi-label active learning using CNN | 89% | Complexity in multi-label problems |
| 26 | Haq et al. (2018) | Heart disease datasets | Neural network with feature selection | 87% | Complexity of hybrid systems |
| 21 | Alizadehsani et al. (2016) | CCTA images | SVM + KNN | 88% | Requires large datasets, computationally expensive |
| 27 | Kim & Kang (2017) | Heart disease datasets | Neural network with feature correlation analysis | 86% | Requires thorough feature engineering |
| 28 | Angraal et al. (2020) | Heart failure datasets | CNN | 90% | Limited to heart failure patients |
| 29 | Sharma et al. (2020) | Heart disease datasets | Artificial plant optimization algorithm | 86% | Performance dependent on noisy data |
| 30 | Mienye & Sun (2021) | Heart disease datasets | Particle swarm optimization and stacked sparse autoencoder | 88% | Increases computational complexity |
The major issues identified in the reviewed studies include dependency on high-quality labelled data, computational complexity, and the limited applicability of certain models to specific datasets or conditions. Such shortcomings can create limitations on generalized performance towards a wider demographic range of patients and different clinical settings. The work proposed here addresses such issues by hybrid exploitation of deep learning, coupled with advanced optimization techniques, ensemble learning, and explainable AI. This approach enhances the robustness of the model, reduces computational burden, and improves its generalization capability across CCTA datasets.
Proposed work
Region-Guided Mask R-CNN (RGM) improves Mask R-CNN with the addition of region-based priors to refine the segmentation ability, especially in cluttered environments, with better object differentiation. It has modified the Region Proposal Network (RPN) to generate and refine object proposals based on region-specific guidance. The Haralick-ResNet Fusion method enhances feature extraction in coronary CCTA images with the incorporation of Haralick features derived from the Gray-Level Co-occurrence Matrix in conjunction with the deep hierarchical representation of ResNet by considering indirect variations that occur in structures of arteries. DeepConvNet is applied primarily in classification procedures, making multiple convolutional layers to obtain hierarchy features starting with low-level texture and going to high-level object representations.
Region-guided mask R-CNN (RGM)
RGM is an extension of the Mask R-CNN architecture designed to improve segmentation performance by incorporating region-based guidance. Region-Guided Mask R-CNN for the irregular shapes, overlapping structures and low-contrast boundaries found in specific images are handled through combining region-based priors with deep instance segmentation methods. Region guidance restricts the creation of proposals to specific areas of the body that are relevant to the proposed approach. This reduces the potential for false positives caused by illumination artifacts or background textures. Additionally, the model is further refined using region-based priors, which improve object distinction in cluttered environments and fine-grained segmentation.
In Fig. 1, RPN of the RGM model functions like its counterpart in Mask R-CNN but with modifications that are guided by regions. It produces candidate object proposals by sliding a small network over the feature map generated by the backbone. The RGM model refines these proposals by incorporating region-specific guidance, which assists in filtering out irrelevant or low-confidence proposals. RPN generates proposals
as follows in Eq. (1) where
is the probability of the object in the kth region and
is the intersection over union between the predicted region
and the ground truth region 
![]() |
1 |
Fig. 1.

Working flow of the proposed Region-Guided Mask R-CNN.
Region-Guided Mask R-CNN improves the Mask R-CNN architecture by adding region-based guidance and attention at the time of segmentation for better performance in challenging environments with occlusions or overlapping objects. RGM employs a Region Proposal Network (RPN) to obtain proposals for objects, further enriched with region-specific guidance to discard irrelevant proposals. Then the region-guided attention mechanism would highlight the areas in the feature map relevant to the question. This enhances the model’s ability to focus on critical regions.
![]() |
2 |
RoI Align is used for preserving spatial precision during feature extraction, and a mask prediction head is used for generating the dice coefficient derived in Eq. (2). A combined loss function is used in training the model, incorporating classification, bounding box regression, and mask segmentation losses. Overall, the RGM model improves the segmentation of fine-grained and occluded objects by refining region proposals, applying selective focus, and addressing misalignment issues inherent in traditional RoI pooling.
![]() |
3 |
In Eq. (3), the Jaccard Index is used to evaluate the Region Guided Attention Mechanism for feature learning in RGM. This mechanism is a key innovation of RGM, as it improves the feature learning process. It utilizes spatial guidance from region proposals to weigh different areas of interest in the feature map, emphasizing regions of interest based on spatial cues and contextual information. Such selective focus would improve mask predictions, particularly for overlapping or occluded objects. In Eq. (4) where
is the feature map at location x,
is a learnable weight matrix for the region guidance and
is the attention score applied to the feature map. It ensures only the relevant regions of the feature.
![]() |
4 |
After region proposals are refined and guided by attention mechanisms, the RGM model proceeds with RoI Align. RoI Align addresses the misalignment issue inherent in RoI pooling by performing bilinear interpolation to preserve spatial precision. The refined features are then passed through the mask head, which generates a binary mask for each object. The mask prediction for the
proposal is computed by Eq. (5) where
is the predicted mask for the ith region proposal,
is a learnable mask weight matrix and
is the feature extracted from the ith region proposal after RoIAlign. It produces binary segmentation mask that indicates the object presence in each pixel.
![]() |
5 |
The overall loss function for training the region guided mask consists of multiple components such as classification, bounding box regression box and the mask segmentation loss. The total loss L is a combination of these individual losses where
is the loss of classification, bounding box and mask segmentation loss are evaluated in Eq. (6)
![]() |
6 |
![]() |
7 |
In Eq. (7),
is the mask prediction loss computed using binary cross entropy where
is the number of pixels in the mask,
is the ground truth value at pixel i and
is the predicted mask value at pixel i. The parameters
and
are weights to balance the contributions of each loss component during training.
RGM gives an improvement over the traditional Mask R-CNN. It integrates region-based guidance mechanisms that refine object proposals and improve mask predictions. Region-specific attention and enhanced RoI Align allow RGM to achieve better segmentation exactness in complex and occluded environments. The proposed model promises to handle fine-grained segmentation tasks and challenges posed by overlapping or fragmented objects.
Haralick-ResNet fusion (HRF)
Haralick-ResNet Fusion (HRF) is a hybrid deep learning framework that improves feature extraction for coronary computed tomography angiography images. The abnormalities in the GIT exhibit minor textural variations rather than significant shape variations; thus, Haralick features are well-suited for computing second-order textural properties such as contrast and homogeneity. The combination of the manually designed texture features with the deep features from ResNet helps improve differentiation of normal tissues from abnormal tissues based on illumination conditions. The Haralick features derived from the grey-level co-occurrence matrix GLCM are full of rich information that can aid in differentiation. HRF then leverages these two feature types by fusing Haralick features with the deep ResNet representations to reach improved classification and segmentation occur.
Haralick features computed from the GLCM, quantify textural properties such as contrast, correlation, energy and homogeneity. Given an image I (x, y), the GLCM P (d, θ) is defined in Eq. (8) where d is the pixel displacement,
is the orientation, I, j are gray level intensities. M, N are the image dimensions.
![]() |
8 |
From the GLCM several Haralick features are computed as follows. Equation (9) measures the local intensity variation. Equation (9) measures pixel intensity relationships. Equation (10) measures texture uniformity. Equation (11) measures similarity in pixel distribution.
![]() |
9 |
![]() |
10 |
![]() |
11 |
![]() |
12 |
The output of a residual block in ResNet is computed using Eq. (12) where
is the input to the lth layer,
is the transformation function with weight parameters
and
is the output of the residual block are evaluated in Eq. (13)
![]() |
13 |
To combine Haralick features with Resnet features, HRF employs a fusion strategy in the feature space. Let H ∈
denote the extracted haralick feature vector and R ∈
represent the ResNet extracted deep features. A fully connected layer is used to align both feature representations as given in Eq. (14) where
,
are trainable weight matrices and
is the final fused feature representation.
![]() |
14 |
Once the fused feature representation
is obtained, it is passed through a classifier for disease prediction. The classification is performed using a softmax activation function as given in Eq. (15) where
is the probability of class c,
are the weight and bias parameters for class c. The denominator ensures normalization across all classes.
![]() |
15 |
For segmentation tasks, HRF utilizes an up-sampling network with convolutional layers that predict a binary mask M (x, y) for coronary artery regions as given in Eq. (16) where
are trainable segmentation parameters and σ is the sigmoid function.
![]() |
16 |
In Fig. 2, HRF framework first computes Haralick features, contrast, correlation, energy, and homogeneity from the Gray-Level Co-occurrence Matrix. These Haralick features are combined with deep features extracted by ResNet using a fully connected layer to create robust fused feature representations. The last fused features will be passed forward through a classifier for disease prediction using a SoftMax function or forwarded through an up sampling network in segmentation tasks to predict a binary mask for regions of coronary arteries.
Fig. 2.
Architecture of enhanced HRF architecture.
DeepConvNet classifier
A DeepConvNet, has gained widespread use in classification applications across multiple domains, such as image recognition, biomedical signal processing, and natural language processing. Hierarchical feature representation from low-level texture information to higher-level pathological patterns exhibits a considerable intra-class variation in GIT images. A deep assembly trained with DeepConvNet can achieve better generalization and reduce manual feature extraction. The core components of the DeepConvNet are convolutional layers, activation functions, pooling layers, and fully connected layers. The convolutional layer is a building block of DeepConvNet that extracts spatial features from input data. Each convolutional layer applies a set of learnable filters W to the input feature map X to produce an output feature map Y. The convolution operation is defined using Eq. (17).
![]() |
17 |
is the output feature at location (I, j) in layer l.
is the corresponding input feature from the previous layer and
is the convolutional filter. The filters in DeepConvNet are trained using backpropagation to learn important patterns from the data. The most used activation function is the Rectified Linear Unit (ReLU) defined in Eq. (18) where if x>0, the function output x and if x<0, the function outputs to 0.
![]() |
18 |
To reduce the spatial dimensions of feature maps and improve computational efficiency, DeepConvNet uses pooling layers. The most common pooling operation is max pooling which selects the maximum value in a local window as given in Eq. (19).
is the pooled output and
are the values in the pooling window. Pooling helps retain the most significant features while reducing the number of parameters and computational cost.
![]() |
19 |
After feature extraction, the output feature maps are flattened and pass through fully connected layers for classification. The fully connected layer applies a weight matrix
and a bias term
to transform the input vector x as given in Eq. (20).
![]() |
20 |
For multiclass classification, the final output is computed using the SoftMax as given in Eq. (21) where
is the probability of class c,
are the weight and bias for class c.
![]() |
21 |
Figure 3 demonstrates the architecture of DeepConvNet, which is a deep convolutional neural network used for the feature extraction and classification process. It includes numerous layers of convolutions followed by the application of ReLU to get nonlinearity in outputs, then application of batch normalization followed by max-pooling for reducing feature dimensions. Further deeper layers get abstract high-level features while final layers, the fully connected, do classification tasks. Dropout avoids overfitting to the best of learning efficiency. It optimizes the architecture using backpropagation and an adaptive optimizer for better convergence. This design allows DeepConvNet to effectively learn hierarchical patterns from complex datasets for image recognition and classification tasks.
Fig. 3.

An architecture of DeepConvNet.
Data handling and experimental protocol
The CCTA dataset employed in this paper is 206 cases of patients that are reached in the publicly accessible Kaggle repository (Ref30. They contain high-resolution volumetric CT angiography images with individual cardiovascular structure segmentation masks in each case.
All the images were converted to a standardized format and their spatial resolution matched. The min-max normalization was used to convert pixel values to the [0,1] range and be able to perform intensity normalization. In order to minimize scanner-related variability, histogram equalization and Gaussian smoothing were used in the preprocessing step.
Patient-wise splitting was done as a way of ensuring reproducibility and preventing data leakage. The data set was separated and split into 70% training, 15% validation, and 15% testing subsets. Also, to check robustness across splits, three-fold cross-validation was performed.
In the application of data augmentation methods such as horizontal flipping, rotation (+ 10 ) and scaling, the techniques were only done on the training set to enhance the generalization and avoid leakage of information to the validation and test sets.
The problem of annotation bias was solved by cross-validation of the segmentation masks with publicly available reference masks. Validation was used to select hyperparameters used to avoid overfitting. The learning rate employed in all experiments was the Adam optimizer (with a learning rate of 0.001 and a batch size of 8).
All the experiments were done in Python with PyTorch framework and random seed was set to make the experiments reproducible.
Results and discussion
The dataset consisted of 206 patients’ CCTA images, total 5.130 billion pixels. The patients had a mean age of 59.9 ± 9.4 years, with 42.7% being female. Though this dataset allows research to be replicated, it contains some bias, which might be associated with the demographics, acquisition methodology, and expert annotations. Though this particular dataset, that is, CCTA, considered in this work, is freely accessible, it contains some bias, which is mentioned below:
There might be some imbalance in its demographic structure, that is, age groups, number of males, and females, along with some differences considering the acquisition methodology associated with the acquisition device, that is, scanners. There might be some bias associated with human annotation because of differences in segmentation carried out by different persons. The model’s performance is influenced by hyperparameters, including filter size (3*3), number of layers (18), learning rate (0.001), and batch size (8). The data set description can be found at https://www.kaggle.com/datasets/saurabhshahane/segmentation-of-multiple-cardiovascular structures, and the image below was used for performance analysis.
Figure 4 compares the original image and the image obtained with the proposed Region-Guided Mask R-CNN by showing (a) the original image, and (b) the segmentation prediction result. Figure 5 displays a sample processed image where the proposed framework has identified and predicted the correct location of the coronary sinus.
Fig. 4.
An original image vs. Segmented image (Region-Guided Mask R-CNN).
Fig. 5.
A sample of image and predicted as Coronary sinus using the proposed work.
The performance is evaluated using accuracy, precision, recall, and F1-score, as defined by equations (22) to (25). TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative. A is the predicted segmentation mask, and B is the ground truth mask. The experiments were performed on a publicly available CCTA image dataset containing cases from 206 patients, with imaging of various CV structures, including the coronary sinus, aorta, ventricles, and pulmonary artery. The image is calculated with segmentation masks estimated by features. At the same time, the endurance-level split was used, with a split of 70% for the training set, 15% for the validation set, and the remaining 15% for the testing set. The experiments also used a three-fold cross-validation.
![]() |
22 |
![]() |
23 |
![]() |
24 |
![]() |
25 |
![]() |
26 |
![]() |
27 |
Table 2 shows the data augmentation of our proposed approach improves by a high margin compared to the training without data growth. The Dice Coefficient for training with data augmentation is 0.91 compared to 0.86 using only training without growth, thus representative that this outline is robust against the effects of imbalanced datasets.
Table 2.
Effect of data augmentation.
| Augmentation Method | Accuracy | Loss | Dice Coefficient | IoU |
|---|---|---|---|---|
| Proposed Work | 96% | 0.12 | 0.91 | 0.87 |
| Without Augmentation | 91% | 0.15 | 0.86 | 0.82 |
| Flipping | 93% | 0.14 | 0.89 | 0.85 |
In Table 3, accuracy is much more than the base model without any augmentation, that is, the model had only 91% accuracy, a loss of 0.15, a Dice coefficient of 0.86, and IoU of 0.82. The model using flipping augmented achieved 93% exactness with a loss of 0.14, 0.89 Dice coefficient, and 0.85 IoU. Table 3 compares the proposed work with U-Net and V-Net. The proposed method achieved the best overall performance with 96% accuracy, 0.12 loss, 0.91 Dice factor, 0.87 IoU, 0.94 precision, 0.92 recall, 0.93 F1-score, and 0.98 AUC.
Table 3.
Comparison of model performance.
| Model | Accuracy | Loss | Dice Coefficient | IoU | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|---|---|---|
| Proposed Work | 96% | 0.12 | 0.91 | 0.87 | 0.94 | 0.92 | 0.93 | 0.98 |
| U-Net | 93% | 0.15 | 0.89 | 0.85 | 0.90 | 0.88 | 0.89 | 0.96 |
| V-Net | 94% | 0.14 | 0.90 | 0.86 | 0.92 | 0.89 | 0.90 | 0.97 |
Table 4 compares the segmentation metrics of the proposed work with U-Net and 3D U-Net. The proposed work resulted in a sensitivity of 0.92, specificity of 0.98, precision of 0.94, recall of 0.92, and an F1-score of 0.93. Both U-Net (0.90 sensitivity, 0.95 specificity, 0.89 precision, 0.86 recall, and 0.87 F1-score) and 3D U-Net (0.91 sensitivity, 0.96 specificity, 0.93 precision, 0.91 recall, and 0.92 F1-score) are surpassed by the proposed work.
Table 4.
Comparison of segmentation metrics.
| Model | Sensitivity | Specificity | Precision | Recall | F1-Score |
|---|---|---|---|---|---|
| Proposed Work | 0.92 | 0.98 | 0.94 | 0.92 | 0.93 |
| U-Net | 0.90 | 0.95 | 0.89 | 0.86 | 0.87 |
| 3D U-Net | 0.91 | 0.96 | 0.93 | 0.91 | 0.92 |
Table 5 shows computation time and number of parameters in the models. In the proposed work, 5.5 min of computation time with 40 M parameters was carried out; U-Net took 4.2 min with 25 M parameters, and V-Net required 5.0 min with 32 M parameters.
Table 5.
Computation time and parameter details.
| Model | Computation Time | Parameter Count |
|---|---|---|
| Proposed Work | 5.5 min | 40 M |
| U-Net | 4.2 min | 25 M |
| V-Net | 5.0 min | 32 M |
Table 6 depicts the segmentation performance of different cardiovascular structures using the proposed work, U-Net, and V-Net. The proposed work performed better than U-Net and V-Net in most of the structures. For example, it obtained a segmentation Correctness of 0.90 for the coronary sinus, 0.87 for the descending aorta, and 0.92 for the left atrial wall. In comparison, U-Net attained 0.88 for the coronary sinus, 0.85 for the descending aorta, and 0.90 for the left atrial wall, while V-Net exhibited similar performance but with slightly smaller values across most structures. In the proposed work, superior Exactness was also depicted in complex regions such as papillary muscle-LV and pulmonary artery with a score of 0.93.
Table 6.
Segmentation performance for various cardiovascular structures.
| Cardiovascular structure | Proposed work | U-Net | V-Net |
|---|---|---|---|
| Coronary Sinus | 0.90 | 0.88 | 0.89 |
| Descending Aorta | 0.87 | 0.85 | 0.86 |
| Inferior Vena Cava | 0.88 | 0.84 | 0.85 |
| Left Atrial Appendage | 0.91 | 0.89 | 0.90 |
| Left Atrial Wall | 0.92 | 0.90 | 0.91 |
| Papillary Muscle – LV | 0.93 | 0.91 | 0.92 |
| Posterior Mitral Leaflet | 0.89 | 0.87 | 0.88 |
| Proximal Ascending Aorta | 0.91 | 0.89 | 0.90 |
| Pulmonary Artery | 0.93 | 0.91 | 0.92 |
| Right Ventricular Wall | 0.92 | 0.89 | 0.90 |
| Superior Vena Cava | 0.91 | 0.88 | 0.89 |
Table 7 shows the outcome of an ablation study analysing the impact of the attention mechanism. The work proposed with the attention mechanism has achieved better results with a Dice coefficient of 0.91, IoU of 0.87, accuracy of 96%, and loss of 0.12. Without attention mechanism, performance of the model was slightly affected with a Dice coefficient of 0.89, IoU of 0.85, accuracy of 94%, and loss of 0.14. In contrast, the performance of U-Net in all metrics is relatively low at a Dice coefficient of 0.88, IoU of 0.83, accuracy of 92%, and a loss of 0.16.
Table 7.
Ablation study (Effect of Attention Mechanism).
| Model | Dice Coefficient | IoU | Accuracy | Loss |
|---|---|---|---|---|
| Proposed Work (Attention) | 0.91 | 0.87 | 96% | 0.12 |
| Proposed Work (No Attention) | 0.89 | 0.85 | 94% | 0.14 |
| U-Net | 0.88 | 0.83 | 92% | 0.16 |
The attention part of the Region-Guided Mask R-CNN was disabled while leaving other components intact and found a degradation in metrics such as Dice score, IoU score, accuracy, and loss. This experiment proved that region-guided attention makes a difference in improving segmentation accuracy as well. Additionally, hyperparameter adjustment was carried out based on validation metrics. The model was trained with a learning rate of 0.001, a batch size of 8, kernel size 3 × 3, and 18 convolutional layers. The Adam optimizer was used with its momentum values left at their default settings.
Table 8 shows that the training loss and validation loss of the epochs compared to the proposed work always remained lower compared to U-Net, Table 9 reports the speed of segmentation; it is noticeable that the proposed work surpasses U-Net and V-Net as it has reached 12.3 FPS against U-Net 10.5 FPS and V-Net 11.0 FPS.
Table 8.
Comparison of training and validation loss.
| Epoch | Proposed work training loss | Proposed work validation loss | U-Net training loss | U-Net validation loss |
|---|---|---|---|---|
| 1 | 0.48 | 0.55 | 0.50 | 0.58 |
| 5 | 0.30 | 0.35 | 0.32 | 0.38 |
| 10 | 0.12 | 0.14 | 0.16 | 0.20 |
Table 9.
Segmentation speed (Frames per Second).
| Model | Frames per Second (FPS) |
|---|---|
| Proposed Work | 12.3 |
| U-Net | 10.5 |
| V-Net | 11.0 |
Table 10 shows the lists of feature importance scores for the Haralick-ResNet Fusion (HRF) model, indicating how much each of the features contributed to the model’s performance. Among the Haralick features, Angular Second Moment (ASM) and Inverse Difference Moment (IDM) have the highest importance scores of 0.12 and 0.10, respectively, followed by Contrast (0.09) and Correlation (0.08). The features of ResNet also play a significant role in the feature importance, with ResNet Feature 1 (Low-level) and ResNet Feature 3 (Contextual) having an importance score of 0.10, while ResNet Feature 2 (High-level) had the highest importance score at 0.12. Figure 6 visually shows these feature importance distributions, and it can be seen that combining both Haralick and ResNet features are important for CAD detection in CCTA images.
Table 10.
Feature Importance for Haralick-ResNet Fusion (HRF).
| Feature | Importance score |
|---|---|
| Angular Second Moment (ASM) | 0.12 |
| Contrast | 0.09 |
| Correlation | 0.08 |
| Variance | 0.07 |
| Inverse Difference Moment (IDM) | 0.10 |
| Sum Average | 0.06 |
| Sum Variance | 0.05 |
| Sum Entropy | 0.05 |
| Entropy | 0.04 |
| Difference Variance | 0.03 |
| Difference Entropy | 0.04 |
| Information Measure of Correlation (IMC) | 0.04 |
| Maximal Correlation Coefficient | 0.03 |
| ResNet Feature 1 (Low-level) | 0.10 |
| ResNet Feature 2 (High-level) | 0.12 |
| ResNet Feature 3 (Contextual) | 0.10 |
| ResNet Feature 4 (Hierarchical) | 0.07 |
Fig. 6.
Feature importance for Haralick-ResNet Fusion (HRF).
Ablation experiment results show that region-guided attention and feature fusion play an important role in improving the Dice and IoU metrics. The hyperparameters were set to kernel size 3 × 3, learning rate 0.001, and batch size 8 for improved convergence on high-resolution medical data and preserving texture information.
Table 11 presents the cross-validation results for both the proposed work and U-Net. The proposed work performs better than U-Net, with an exactness of 96% in Fold 1, 97% in Fold 2, and 95% in Fold 3, while U-Net achieves correctness values of 93%, 94%, and 92%, respectively. The proposed work also maintains a lesser loss than that of U-Net on all folds, showing its strength again.
Table 11.
Cross validation results.
| Fold | Proposed work8 accuracy | Proposed work loss | U-Net accuracy | U-Net loss |
|---|---|---|---|---|
| Fold 1 | 96% | 0.12 | 93% | 0.14 |
| Fold 2 | 97% | 0.11 | 94% | 0.13 |
| Fold 3 | 95% | 0.13 | 92% | 0.15 |
Table 12: Performance of the proposed model in comparison with different batch sizes, demonstrating that a batch size of 8 has performed the best in terms of the accuracy value 96%, the loss of 0.12, the Dice coefficient of 0.91, and IoU of 0.87. Larger batch sizes (16 and 32) show a drop in performance, with accuracy and Dice coefficient dropping as the batch size increases, confirming the importance of batch size in training efficiency and model performance.
Table 12.
Impact of batch size.
| Batch Size | Accuracy | Loss | Dice coefficient | IoU |
|---|---|---|---|---|
| 8 | 96% | 0.12 | 0.91 | 0.87 |
| 16 | 94% | 0.14 | 0.89 | 0.85 |
| 32 | 93% | 0.15 | 0.87 | 0.82 |
Table 13 shows the performance of the DeepConvNet classifier in terms of accuracy, precision, and recall for the different cardiovascular structures. The Correctness is high and ranges from 0.87 for the Descending Aorta to 0.93 for the Pulmonary Artery and Papillary Muscle – LV. The precision and recall scores closely mirror the Error rate, indicating that the model has balanced performance for the detection of these structures. For instance, for the Left Atrial Wall, the accuracy was 0.92, precision was 0.93, and recall was 0.94, proving that the structure is well captured by the network.
Table 13.
DeepConvNet classifier based on accuracy, precision and recall.
| Cardiovascular structure | Proposed work (Accuracy) | Proposed work (Precision) | Proposed work (Recall) |
|---|---|---|---|
| Coronary Sinus | 0.90 | 0.91 | 0.92 |
| Descending Aorta | 0.87 | 0.88 | 0.89 |
| Inferior Vena Cava | 0.88 | 0.89 | 0.90 |
| Left Atrial Appendage | 0.91 | 0.92 | 0.93 |
| Left Atrial Wall | 0.92 | 0.93 | 0.94 |
| Papillary Muscle – LV | 0.93 | 0.94 | 0.95 |
| Posterior Mitral Leaflet | 0.89 | 0.90 | 0.91 |
| Proximal Ascending Aorta | 0.91 | 0.92 | 0.93 |
| Pulmonary Artery | 0.93 | 0.94 | 0.95 |
| Right Ventricular Wall | 0.92 | 0.93 | 0.94 |
| Superior Vena Cava | 0.91 | 0.92 | 0.93 |
Figure 7 shows that the ROC curves are properly separable, with each class having an AUC of 1.0, evaluates properly between positive and negative rates at all features thresholds. This estimates that in the learned space, complete separation occurs as the actual constructive rate approaches unity, with a low false-positive rate and low classification uncertainty.
Fig. 7.

RoC curve of the proposed model.
The proposed Deep ConvNet model is compared to other classification models in Table 14, where it achieved improved results with an accuracy of 98.3% and a loss of 0.15. In comparison, the existing VGG-16, ResNet-50, Inception V3, and Efficient Net B3 models have lower Error rate and higher loss values, indicating that the Deep ConvNet is effective shows addressing the issue.
Table 14.
Comparative analysis with other models.
| Model | Accuracy (%) | Loss |
|---|---|---|
| Proposed DeepConvNet | 98.3 | 0.15 |
| VGG-16 | 87.2 | 0.4 |
| ResNet-50 | 90.8 | 0.32 |
| InceptionV3 | 91.5 | 0.29 |
| EfficientNet-B3 | 93.0 | 0.26 |
Figure 8 displays the Error ratio and loss for each comparative analysis. The results demonstrate that the proposed Deep Conv Net outperforms all other models in terms of classification accuracy (98.3%) and achieves a low loss of 0.15. This loss can be attributed to the effective extraction of ordered convolutional features and non-linear feature modelling, which significantly increases the separability of classes for CCTA image data. The VGG-16 have higher losses due to their large parameter counts and redundant feature extraction. The other approaches, such as ResNet-50, InceptionV3, and EfficientNet-B3, perform differently.
Fig. 8.
Visualization of model accuracy and loss.
Table 15 shows a comparison of the proposed DeepConvNet with Hybrid Feature Learning with various classification approach from previous research work. The proposed model outperforms the existing methodologies, resulting in an accuracy of 98.3%, precision of 98.0%, recall of 98.5%, and F1-score of 98.2%. However, other approaches like CNN, DeepCNN, and Efficient CNN show the performance metrics, when compare to existing approach.
Table 15.
Comparison with related works.
| S. No | Author(s) et al. (Year) | Methodology | Accuracy (%) | Precision | Recall | F1 Score |
|---|---|---|---|---|---|---|
| 1 | Wu et al. (2025) | CNN | 91 | 90.5 | 91.2 | 90.8 |
| 2 | Ihdayhid et al. (2024) | Deep CNN | 90 | 89.8 | 90.2 | 90.0 |
| 3 | Dutta et al. (2020) | Efficient CNN | 90 | 89.5 | 90.1 | 89.8 |
| 5 | Huang et al. (2023) | CNNLSTM | 89 | 88.8 | 89.5 | 89.1 |
| 6 | Han et al. (2020) | DCNN | 89 | 88.5 | 89.2 | 88.8 |
| 7 | Chen et al. (2024) | Positive-unlabeled learning | 88 | 87.8 | 88.5 | 88.1 |
| 8 | Saboor et al. (2022) | SVM + RF | 88 | 87.5 | 88.2 | 87.8 |
| 9 | Proposed Work | DeepConvNet with Hybrid Feature Learning | 98.3 | 98.0 | 98.5 | 98.2 |
Conclusion
The Region-Guided Mask R-CNN model is more efficient than the conventional Mask R-CNN architecture and attention mechanisms that refine object proposals and improve segmentation exactness in cluttered environments. The addition of region-guided attention along with RoI improves the model’s ability to differentiate fine-grained objects and eliminates misalignment problems in traditional RoI pooling. The other approach combines the Haralick texture features with the deep hierarchical ResNet representations to enhance feature extraction in coronary CCTA images. It enhances both classification and segmentation correctness where traditional CNNs fail due to the inability in subtle variation of textures present in medical imaging. Our proposed approach achieved a Dice coefficient of 0.91 and a Jaccard Index (IoU) of 0.87, surpassing the results of U-Net (0.89 Dice, 0.85 IoU) and V-Net (0.90 Dice, 0.86 IoU), thus demonstrating improved segmentation exactness. The proposed model DeepConvNet with Hybrid Feature Learning attained accuracy of 98.3%, precision of 98.0%, recall of 98.5%, and F1 score of 98.2%. For future work, the combination of RGM with image pre-processing techniques such as multi-scale feature extraction improves segmentation performance in more complex environments.
Author contributions
G. Revathi conceived and designed the study, developed and implemented the region-guided Mask R-CNN and Haralick–ResNet fusion algorithms, performed the experiments and quantitative analyses, and prepared the original draft of the manuscript. Dr. Cyril Mathew O contributed to study conceptualization and methodology, supervised the research and algorithm development, provided clinical insight for dataset curation and interpretation of results, assisted with validation and statistical analysis, and critically revised the manuscript for important intellectual content. Both authors reviewed and approved the final manuscript and agree to be accountable for all aspects of the work.
Data availability
CCTA dataset utilized in the current study is publically accessible in the Kaggle repository [11]. Both the processed datasets, the segmentation masks, as well as the derived feature representation during the current study are accessible on request by the respective author. Handling of data was done according to patient-wise splitting in order to avoid leakage between training and test sets.
Code availability
The complete implementation of the proposed Region-Guided Mask R-CNN (RGM), Haralick-ResNet Fusion (HRF), and DeepConvNet classifier was developed in Python using the PyTorch deep learning framework.
The source code, including model architectures, preprocessing scripts, training pipeline, evaluation routines, and configuration files, is publicly available at: GitHub Repository: https://github.com/Grevathi123/CORONARY-ARTERY-DISEASE-DETECTION.git Due to file size limitations imposed by GitHub, large pretrained model weights, processed datasets, and experimental outputs are hosted externally. Access instructions for these files are provided in the repository README file. To ensure long-term accessibility and version control, an archived release of the repository corresponding to the version used to generate the results reported in this manuscript will be deposited in a DOI-minting repository (e.g., Zenodo). The DOI link will be added upon publication to guarantee permanent availability and citation stability. All dependencies used in the implementation are open-source libraries. No proprietary software was employed.
Declarations
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Zhou, T., Shen, C. & Bourantas, C. AS-0143 Deep Learning–Based Analysis of CCTA Images for the Assessment of Plaque Characteristics and Risk in Coronary Artery Disease. J. Am. Coll. Cardiol.84 (22_Supplement), C20–C20 (2024). [Google Scholar]
- 2.Brendel, J. M. et al. Coronary artery disease detection using deep learning and ultrahigh-resolution photon-counting coronary CT angiography. Diagnostic Interv. Imag. (2024). [DOI] [PubMed]
- 3.Chu, M. et al. Advances in diagnosis, therapy, and prognosis of coronary artery disease powered by deep learning algorithms. JACC: Asia. 3 (1), 1–14 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.van Noort, D. et al. Evaluating machine learning accuracy in detecting significant coronary stenosis using CCTA-derived fractional flow reserve: Meta-analysis and systematic review. IJC Heart & Vasculature, 101528. (2024). [DOI] [PMC free article] [PubMed]
- 5.Shu, Z. Y. et al. Predicting chronic myocardial ischemia using CCTA-based radiomics machine learning nomogram. J. Nuclear Cardiol.29 (1), 262–274 (2022). [DOI] [PubMed] [Google Scholar]
- 6.Lin, A. et al. Deep learning-enabled coronary CT angiography for plaque and stenosis quantification and cardiac risk prediction: an international multicentre study. Lancet Digit. Health. 4 (4), e256–e265 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bhowmik, P. K. et al. Advancing Heart Disease Prediction through Machine Learning: Techniques and Insights for Improved Cardiovascular Health. Br. J. Nurs. Stud.4 (2), 35–50 (2024). [Google Scholar]
- 8.Reza-Soltani, S. et al. The role of artificial intelligence and machine learning in cardiovascular imaging and diagnosis. Cureus, 16(9), e68472. (2024). [DOI] [PMC free article] [PubMed]
- 9.Bizimana, P. C., Zhang, Z., Asim, M., El-Latif, A. A. A. & Hammad, M. Learning-based techniques for heart disease prediction: a survey of models and performance metrics. Multimedia Tools Appl., 83(13). (2024).
- 10.Diwakar, M., Tripathi, A., Joshi, K., Memoria, M. & Singh, P. Latest trends on heart disease prediction using machine learning and image fusion. Mater. Today:Proc.37, 3213–3218 (2021). [Google Scholar]
- 11.Han, D. et al. Deep learning analysis in coronary computed tomographic angiography imaging for the assessment of patients with coronary artery stenosis. Comput. Methods Programs Biomed.196, 105651 (2020). [DOI] [PubMed] [Google Scholar]
- 12.Hu, J., Hao, G., Xu, J., Wang, X. & Chen, M. Deep learning-based coronary artery calcium score to predict coronary artery disease in type 2 diabetes mellitus. Heliyon, 10(6). (2024). [DOI] [PMC free article] [PubMed]
- 13.Ihdayhid, A. R. et al. Coronary artery stenosis and high-risk plaque assessed with an unsupervised fully automated deep learning technique. JACC: Advances, 100861. (2024). [DOI] [PMC free article] [PubMed]
- 14.Paul, J. F., Rohnean, A., Giroussens, H., Pressat-Laffouilhere, T. & Wong, T. Evaluation of a deep learning model on coronary CT angiography for automatic stenosis detection. Diagn. Interv. Imaging. 103 (6), 316–323 (2022). [DOI] [PubMed] [Google Scholar]
- 15.Wu, Y., Qi, H., Zhang, X. & Xing, Y. Predictive Value of CCTA-based Pericoronary Adipose Tissue Imaging for Major Adverse Cardiovascular Events. Acad. Radiol.32 (1), 91–101 (2025). [DOI] [PubMed] [Google Scholar]
- 16.Chen, F. et al. Positive-unlabeled learning for coronary artery segmentation in CCTA images. Biomed. Signal Process. Control. 87, 105473 (2024). [Google Scholar]
- 17.Ainiwaer, A. et al. Deep learning of heart-sound signals for efficient prediction of obstructive coronary artery disease. Heliyon, 10(1). (2024). [DOI] [PMC free article] [PubMed]
- 18.Huang, Z. et al. Clinical evaluation of the automatic coronary artery disease reporting and data system (CAD-RADS) in coronary computed tomography angiography using convolutional neural networks. Acad. Radiol.30 (4), 698–706 (2023). [DOI] [PubMed] [Google Scholar]
- 19.Dutta, A., Batabyal, T., Basu, M. & Acton, S. T. An efficient convolutional neural network for coronary heart disease prediction. Expert Syst. Appl.159, 113408 (2020). [Google Scholar]
- 20.Alizadehsani, R. et al. Coronary artery disease detection using computational intelligence methods. Knowl. Based Syst.109, 187–197 (2016). [Google Scholar]
- 21.Mienye, I. D. & Jere, N. Optimized ensemble learning approach with explainable AI for improved heart disease prediction. Information15 (7), 394 (2024). [Google Scholar]
- 22.Kusunose, K. et al. A deep learning approach for assessment of regional wall motion abnormality from echocardiographic images. Cardiovasc. Imaging. 13 (2_Part_1), 374–381 (2020). [DOI] [PubMed] [Google Scholar]
- 23.Saboor, A. et al. A method for improving prediction of human heart disease using machine learning algorithms. Mob. Inform. Syst.2022 (1), 1410169 (2022). [Google Scholar]
- 24.El-Hasnony, I. M., Elzeki, O. M., Alshehri, A. & Salem, H. Multi-label active learning-based machine learning model for heart disease prediction. Sensors22 (3), 1184 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Haq, A. U., Li, J. P., Memon, M. H., Nazir, S. & Sun, R. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob. Inform. Syst.2018 (1), 3860146 (2018). [Google Scholar]
- 26.Kim, J. K. & Kang, S. Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation Analysis. J. Healthc. Eng.2017 (1), 2780501 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Angraal, S. et al. Machine learning prediction of mortality and hospitalization in heart failure with preserved ejection fraction. JACC: Heart Fail.8 (1), 12–21 (2020). [DOI] [PubMed] [Google Scholar]
- 28.Sharma, P. et al. Artificial plant optimization algorithm to detect heart rate & presence of heart disease using machine learning. Artif. Intell. Med.102, 101752 (2020). [DOI] [PubMed] [Google Scholar]
- 29.Mienye, I. D. & Sun, Y. Improved heart disease prediction using particle swarm optimization based stacked sparse autoencoder. Electronics10 (19), 2347 (2021). [Google Scholar]
- 30.Dataset Collection. Kaggle repository: https://www.kaggle.com/datasets/saurabhshahane/segmentation-of-multiple-cardiovascular-structures
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
CCTA dataset utilized in the current study is publically accessible in the Kaggle repository [11]. Both the processed datasets, the segmentation masks, as well as the derived feature representation during the current study are accessible on request by the respective author. Handling of data was done according to patient-wise splitting in order to avoid leakage between training and test sets.
The complete implementation of the proposed Region-Guided Mask R-CNN (RGM), Haralick-ResNet Fusion (HRF), and DeepConvNet classifier was developed in Python using the PyTorch deep learning framework.
The source code, including model architectures, preprocessing scripts, training pipeline, evaluation routines, and configuration files, is publicly available at: GitHub Repository: https://github.com/Grevathi123/CORONARY-ARTERY-DISEASE-DETECTION.git Due to file size limitations imposed by GitHub, large pretrained model weights, processed datasets, and experimental outputs are hosted externally. Access instructions for these files are provided in the repository README file. To ensure long-term accessibility and version control, an archived release of the repository corresponding to the version used to generate the results reported in this manuscript will be deposited in a DOI-minting repository (e.g., Zenodo). The DOI link will be added upon publication to guarantee permanent availability and citation stability. All dependencies used in the implementation are open-source libraries. No proprietary software was employed.
































