Cardiac murmur grading and risk analysis of cardiac diseases based on adaptable heterogeneous-modality multi-task learning

Chenyang Xu; Xin Li; Xinyue Zhang; Ruilin Wu; Yuxi Zhou; Qinghao Zhao; Yong Zhang; Shijia Geng; Yue Gu; Shenda Hong

doi:10.1007/s13755-023-00249-4

. 2023 Dec 1;12(1):2. doi: 10.1007/s13755-023-00249-4

Cardiac murmur grading and risk analysis of cardiac diseases based on adaptable heterogeneous-modality multi-task learning

Chenyang Xu ^1,^#, Xin Li ^2,^#, Xinyue Zhang ¹, Ruilin Wu ¹, Yuxi Zhou ^1,^3,^✉, Qinghao Zhao ⁴, Yong Zhang ³, Shijia Geng ⁷, Yue Gu ¹, Shenda Hong ^5,^6,^✉

PMCID: PMC10692066 PMID: 38045019

Abstract

Cardiovascular disease (CVDs) has become one of the leading causes of death, posing a significant threat to human life. The development of reliable Artificial Intelligence (AI) assisted diagnosis algorithms for cardiac sounds is of great significance for early detection and treatment of CVDs. However, there is scarce research in this field. Existing research mainly faces three major challenges: (1) They mainly limited to murmur classification and cannot achieve murmur grading, but attempting both classification and grading may lead to negative effects between different multi-tasks. (2) They mostly pay attention to unstructured cardiac sound modality and do not consider the structured demographic modality, as it is difficult to balance the influence of heterogeneous modalities. (3) Deep learning methods lack interpretability, which makes it challenging to apply them clinically. To tackle these challenges, we propose a method for cardiac murmur grading and cardiac risk analysis based on heterogeneous modality adaptive multi-task learning. Specifically, a Hierarchical Multi-Task learning-based cardiac murmur detection and grading method (HMT) is proposed to prevent negative interference between different tasks. In addition, a cardiac risk analysis method based on Heterogeneous Multi-modal feature impact Adaptation (HMA) is also proposed, which transforms unstructured modality into structured modality representation, and utilizes an adaptive mode weight learning mechanism to balance the impact between unstructured modality and structured modality, thus enhancing the performance of cardiac risk prediction. Finally, we propose a multi-task interpretability learning module that incorporates an important evaluation using random masks. This module utilizes SHAP graphs to visualize crucial murmur segments in cardiac sound and employs a multi-factor risk decoupling model based on nomograms. And then we gain insights into the cardiac disease risk in both pre-decoupled multi-modality and post-decoupled single-modality scenarios, thus providing a solid foundation for AI assisted cardiac murmur grading and risk analysis. Experimental results on a large real-world CirCor DigiScope PCG dataset demonstrate that the proposed method outperforms the state-of-the-art (SOTA) method in murmur detection, grading, and cardiac risk analysis, while also providing valuable diagnostic evidence.

Keywords: Cardiac auscultation, Cardiac murmurs detection and grading, Multi-task learning, Risk analysis, Heterogeneous-modality

Introduction

Cardiovascular disease (CVDs) has escalated to the status of the foremost “number one killer” posing a grave threat to human health [1]. With 17.9 million people expected to die from cardiovascular disease each year, accounting for 33% of all deaths [2]. Medical research has demonstrated that a staggering 90% of cardiac diseases can be prevented, yet merely a quarter of patients who suffer from a substantial number of fatalities exhibit symptoms in the early stages of the condition [3]. However, the traditional detection of CVDs primarily occurs in hospitals, and a considerable number of patients only seek medical attention after experiencing severe symptoms. This delay in seeking medical care can lead to tragic events, such as sudden death, due to the delayed initiation of optimal treatment for cardiac diseases. Therefore, the development of portable, home-use, intelligent cardiac sound diagnosis methods is crucial for timely diagnosing numerous life-threatening cardiac diseases (such as hypertrophic cardiomyopathy [4], myocardial ischemia [5], valvular cardiac disease [6], etc.) and preventing delays in treatment.

Common methods for the clinical diagnosis of CVDs include electrocardiography and cardiac auscultation [7]. Electrocardiography, which records electrical signals, is typically effective only for detecting arrhythmias, but may not be able to detect structural abnormalities or defects in the heart [8]. In contrast, cardiac auscultation is a more common diagnostic tool for cardiac murmurs, which are abnormal sounds that can be present in healthy individuals but are more commonly found in patients with organic cardiac disease [9]. Studies have shown that cardiac auscultation can achieve a high diagnostic rate of over 90% for cardiac diseases such as arrhythmias [10], congenital cardiac disease [11], and valve disease [12]. Due to its non-invasive nature, lack of reliance on expensive equipment, and ease of use, cardiac auscultation is widely used in community cardiac disease screening. However, traditional cardiac auscultation methods have significant drawbacks as internal medicine and family physicians misdiagnose about 80% of common cardiac problems [13]. The subjective experience and judgment of clinicians in identifying cardiac murmur types can have a direct negative impact on detection results. Therefore, the development of artificial intelligence-assisted cardiac auscultation algorithms is of great significance for improving the accuracy of cardiac disease diagnosis. Such algorithms can help clinicians or patients automatically identify cardiac murmurs, thereby improving the early diagnosis rate of cardiac disease.

However, such research is relatively limited and faces three major challenges.

Most existing research can only classify cardiac murmurs and cannot grade the severity of cardiac murmurs [14, 15]. Performing both classification task and grading task may lead to negative effects between multiple different tasks. Compared to cardiac murmurs detection, the international standardized grading of cardiac murmurs is of greater significance for the diagnosis of cardiac diseases, as not all cardiac murmurs are associated with organic cardiac disease. For example, usually systolic murmurs with grade II/VI or lower are mostly physiological/functional which may not necessarily indicate cardiac disease (e.g., patient #3 as shown in Fig. 1) [16]. However, only a few studies have attempted to grade cardiac murmurs [17]. These methods can only distinguish coarse-grained categories (e.g., NAN, soft, loud, and so on) without simultaneously performing the task of classifying cardiac murmurs. However, as the cardiac sound data would be inevitably affected by various noises in practical scenarios, it is difficult for the cardiologists to distinguish the standard murmur grades for them [13]. As shown in Fig. 1, according to the internationally standardized grading, the grades of patient #1 who is too noisy to graded would be assigned to be the same type of “NAN" as patient #2, even their types of cardiac murmurs are different. That is, there are murmurs in the cardiac sound data of patient #1, but patient #2 gets a completely opposite result. If we analyse a patient’s risk of cardiac disease by only analysing his cardiac murmur grade without murmur classification, it may lead to misdiagnosis. Therefore, it is necessary to perform multi-task learning of both cardiac murmur classification and grading. However, it is challenging because different tasks need to pay attention to different features and thus lead to negative interference between different tasks [18].
Most existing research does not consider multi-modal data such as demographic information other than cardiac sounds, as they often encounter the problem of balancing heterogeneous modalities when facing both high-dimensional unstructured cardiac sound monitoring modal and low-dimensional structured demographic modal. Compared to single-modality data, multi-modal data can more comprehensively reflect the patterns of cardiac disease and is helpful to improve the accuracy of cardiac disease analysis. For instance, as shown in Fig. 1, influenced by the fetal heartbeat, patient #3 who is pregnant usually has a much lower risk of organic cardiac disease than patient #4 who is not pregnant, even if they both get a “III/VI" grade. In order to incorporate demographic information into model building, some existing studies directly combine time-series features extracted by deep models with demographic data to establish a model [19]. Unfortunately, these methods inevitably make the model more susceptible to the influence of high-dimensional unstructured modal, thereby ignoring the importance of low-dimensional structured modal. Therefore, when establishing a model for diagnosing cardiac disease, how to alleviate the dominant position of long-term data in heterogeneous multi-modal data is a challenging task.
Most existing methods have poor interpretability, making clinical application difficult. This is because current AI-assisted cardiac sound diagnosis methods typically rely on “black-box" models such as deep learning, which require a large amount of high-dimensional data for effective training. This often results in thousands or even millions of nodes in the input layer of the model, making it difficult to interpret the model’s parameters and internal mechanisms. This means that although these models demonstrate excellent performance on many tasks, doctors and patients often find it difficult to understand how the model works internally. This is a disastrous problem for the medical field because doctors typically need to understand the model’s prediction results in order to provide a final diagnosis and give reasonable diagnostic evidence. The lack of diagnostic evidence may limit the clinical application of these methods.

To address the aforementioned challenges, this paper proposes a method for cardiac murmur grading and cardiac disease risk analysis based on heterogeneous modality adaptive multi-task learning. Specifically, this paper has three main contributions.

A hierarchical multi-task learning (HMT) method is proposed for cardiac murmur detection and grading, which avoids the negative effects of multi-task learning between cardiac murmur detection and grading by constructing a hierarchical architecture.
A cardiac disease risk analysis method based on heterogeneous multi-modal feature impact adaptation (HMA) is proposed. By assigning appropriate weights to different types of features, the method balances the impact of multi-modal structural modalities such as cardiac sounds with uni-modal structural modalities like demographics.
A multi-task explainable learning module is proposed. For the HMT module, a random masking-based importance evaluation method is used to measure the contribution of segments of the multi-dimensional structured modalities to the model, and SHAP graphs are used to visualize the important murmur segments in the cardiac sound monitoring data. For the HMA module, a nomogram-based multi-factor risk decoupling model is proposed. By learning the cardiac disease risks before and after decoupling, this method can model the contribution of different modalities to cardiac disease risk, providing a basis for intelligent assisted diagnosis to perform risk analysis. This method improves the interpretability and clinical applicability of cardiac disease diagnosis models, making them more accessible and useful for doctors and patients.

Fig. 1 — An example of cardiac murmur detection, grading and risk analysis of cardiac diseases. On the one hand, we can see that according to the international standardized grading of cardiac murmurs, both patient #1 and patient #2 would be graded as “NAN", even their murmur types are “Unknown" (which means it is “too noisy to graded", and there would be some potential cardiac disease risks) and “Absent" (which means its grade is “no cardiac murmur", and there is a very low cardiac disease risk), respectively. And thus if we analyse a patient’s risk of cardiac disease by only analysing his cardiac murmur grade without murmur classification, it may lead to misdiagnosis. Moreover, compared to cardiac murmurs detection, the grading of cardiac murmurs is of greater significance for the diagnosis of cardiac diseases, as not all cardiac murmurs are associated with organic cardiac disease. For instance, patient #3 has much higher risk of cardiac disease than patient #4, as systolic murmurs with grade II/VI or lower are mostly physiological/functional which may not necessarily indicate cardiac disease. And thus, it is necessary to perform multi-task learning of both cardiac murmur classification and grading. On the other hand, compared to patient #4, patient #5 who is pregnant may have a much lower risk, even she get a “III/VI" grade which are generally associated with organic cardiac disease. That is mainly because the heart sounds of a pregnant women may be affected by fetal heartbeat. Therefore, considering multi-modal data is helpful to improve the accuracy of cardiac disease analysis

Related work

The existing research can be categorized into three main groups: machine learning methods based on manual feature engineering, deep learning methods, and methods that combine them.

Feature engineering method

Commonly used feature extraction methods in this domain primarily revolve around signal processing [20–24], feature segmentation [15, 25], and Mel-scale Frequency Cepstral Coefficients (MFCC) [22, 26].

For example, Fatemeh et al. [20] proposed a multi-level basis selection signal processing method that achieved promising prediction outcomes. In this approach, three exclusion criteria were defined to eliminate nodes with less noise and information, thereby reducing the dimensionality of the feature space. Kotb et al. [27] utilized MFCC features and selected 13 MFCC coefficients to construct a Hidden Markov Model (HMM) for cardiac murmur classification. By integrating three different machine learning models (RF, KNN, and XGB), Adyasha et al. [22] achieved effective classification on unbalanced training and test datasets by leveraging discrete wavelet transform and MFCC.

Although machine learning methods demonstrate strong performance on specific datasets, they heavily rely on expert knowledge due to the necessity of manual feature extraction. Consequently, the generalizability of such methods is limited, making practical implementation challenging.

Deep learning method

Unlike traditional machine learning methods, deep learning is a technique capable of automatically extracting features and training models without human intervention. Currently, numerous deep learning methods have achieved remarkable precision in various domains.

In the context of cardiac sound classification, studies have explored the use of Convolutional Neural Networks (CNN) [28, 29], residual blocks [30, 31], Long Short-Term Memory (LSTM) [32, 33], and Recurrent Neural Networks (RNN) [32, 34, 35]. These methods have demonstrated promising results.

For example, Patwa et al. [29] compared the performance of 1D-CNN, 2D-CNN, RNN, CNN combined with RNN, and ResNet, finding that 1D-CNN achieved the highest accuracy. They also conducted an in-depth analysis of the model, providing insights into the optimal network architecture and parameters. Raza et al. [32] designed a network architecture based on RNN and LSTM that efficiently detects heartbeat signals, providing valuable information for determining the need for further treatment. Andoni et al. [17] proposed a method based on convolutional neural networks to classify systolic noise intensity. However, this method only implements coarse-grained grading and does not adhere to the Levine6 standard [36].

Nevertheless, it’s important to note that deep learning models typically require greater memory space and longer training times, making them computationally expensive. Additionally, the interpretability of deep learning models can be limited, and unexpected deviations may occur in certain tasks.

Combined method

To address the issue of poor interpretability, an increasing number of researchers are combining deep learning with traditional machine learning techniques, leading to three distinct categories of methods.

The first category involves adding a feature preprocessing module before the deep learning model [37–39]. For example, Araujo et al. [39] incorporated demographic data and utilized random splicing. However, a weakness of their method is the potential exclusion of certain embeddings, which can impact the final classification if important information is missed.

The second category involves using neural networks for feature extraction and subsequently employing machine learning models for complete classification [40]. Gunduz et al. [40] employed a hybrid DNN model combined with MFCC features and made predictions using decision trees, support vector machines (SVM), naive Bayes (NB), and K-nearest neighbors (KNN). This method demonstrates the fusion of deep learning and traditional machine learning techniques.

The third category takes into account the impact of demographic data on murmur detection and grading. For instance, pregnancy can introduce additional sounds from the fetus’s heartbeat, influencing the quality and frequency of murmurs. However, most existing methods do not adequately consider demographics. Chang et al. [41] proposed a multi-task learning framework that extracts correlation embedding vectors of cardiac sound records, representing them in both the time domain (original time series) and frequency domain (MFCC features). This method not only achieves good interpretability but also incorporates demographic data for multi-modal predictions.

Considering the strengths and weaknesses of these methods is important since crucial information might be overlooked or limitations in data representation and processing could be suffered by some methods.

Method

Overview

The cardiac murmur grading and cardiac risk analysis method based on heterogeneous modality adaptive multi-task learning is described as a multi-task learning problem, which mainly includes tasks such as identification, grading of cardiac murmurs, and risk analysis of cardiac diseases.

The input data of the method is heterogeneous multi-modal cardiac monitoring data of patients, defined as:

\begin{matrix} X = {S, A, G, P, H, W, . . .} \end{matrix}

where A, G, P, H, and W respectively represent structured modal data of demographic information such as age, gender, pregnancy status, height, and weight of the patient. $S = {s^{1}, \dots, s^{c}, . . .}$ represents the non-structured modality consisting of multiple cardiac auscultation audio time series monitoring data from different listening channels, where c represents the number of auscultation channels and each channel’s data consists of an unequal length of cardiac auscultation audio monitoring sequence:

\begin{matrix} s^{c} = {s_{1}^{c}, s_{2}^{c}, \dots, s_{i}^{c}, . . .} \end{matrix}

$Y = {ω, G, O}$ represents the patient’s cardiac sound diagnosis results, where $ω$ indicates whether the patient has a cardiac murmur (which could be abnormal, normal, or unknown), $G$ represents the size of the cardiac murmur grading (which could be NAN, I/VI, II/VI, III/VI, etc. according to international standardization [42]), and $O$ indicates whether the patient has organic cardiac disease.

The method’s architecture is shown in Fig. 2. Firstly, the HMT module is used to model the cardiac auscultation audio data from different listening channels and obtain the multi-task prediction results ${\hat{ω}, \hat{G}}$ , where $\hat{ω}$ represents the model’s output of the cardiac murmur category, and $\hat{G}$ represents the model’s output of the cardiac murmur grading. Furthermore, the HMA module is used to transform the non-structured modality S output by the HMT module into structured representations $\hat{ω}$ and $\hat{G}$ , which are then combined with demographic information ${A, G, P, H, W, . . .}$ to form structured modal data, achieving the adaptive balance between heterogeneous modal influences. The mothod outputs the patient’s cardiac risk analysis $\hat{O}$ . Finally, using the multi-task explainable learning module, the results of cardiac murmur identification, grading, and cardiac risk analysis are obtained:

\begin{matrix} \hat{Y} = {\hat{ω}, \hat{G}, \hat{O}, \hat{s}, C} \end{matrix}

where $\hat{s} = {{\hat{s}}^{1}, \dots, {\hat{s}}^{c}, . . .}$ represents the cardiac auscultation audio from different listening channels annotated with interpretability, and C represents the cardiac disease risk of patients under M decoupled heterogeneous modalities ${C_{1}, C_{2}, \dots, C_{M}}$ .

Fig. 2 — Overview of the proposed method

Cardiac murmur detection and grading based on hierarchical multi-task learning (HMT)

Most existing research can only solve the problem of whether there is a cardiac murmur and whether it is abnormal in cardiac auscultation monitoring data, but cannot provide a scientific grading of cardiac murmurs. However, many murmurs (especially grade I or II cardiac murmurs) may be physiological or functional rather than pathological and do not have medical significance. Therefore, simply analyzing the presence or absence of cardiac murmurs cannot provide good decision support for the diagnosis of cardiac disease, and a solution that can scientifically grade cardiac murmurs should be proposed. However, performing both classification and grading simultaneously in multi-task learning is challenging as different tasks need to pay attention to different features. For example, cardiac murmur classification tasks focus more on distinguishing whether there is another sound besides the main cardiac sound, while cardiac murmur grading tasks focus on characteristics such as the duration, shape, pitch, and quality of cardiac sounds. And this inevitably leads to negative interference between different tasks when using the same model for multi-task learning.

To address this issue, we propose a hierarchical multi-task learning-based module for cardiac murmur identification and grading as shown in Fig. 3, which avoids negative interactions between multi-task learning by constructing a hierarchical architecture.

Specifically, the cardiac auscultation audio monitoring data $s^{c}$ is first divided into multiple time windows, and the Fourier transform is performed on each time window to obtain the frequency domain representation of the signal in that time window. For a given time window n, the value $x (m + n T)$ of the mth sampling point is weighted by a window function w(m), multiplied by a complex exponential $e^{- j 2 π k m / T}$ , and then summed to obtain the contribution of the time window at frequency k, denoted as $X_{k} (n)$ . By superimposing the frequency domain representations of different time windows, the distribution of the original signal in the time and frequency domains can be obtained.

\begin{matrix} X_{k} (n) = \sum_{m = 0}^{T - 1} x (m + n T) w (m) e^{- j 2 π k m / T} \end{matrix}

Then, $s^{c}$ is transformed from the time domain to the frequency domain feature $h^{c} = H (s^{c})$ , and the feature $f^{c} = F (h^{c})$ of the cth channel is extracted using a ResNet network. The dimension of feature $f^{c}$ is (h, w), and various channel-specific cardiac murmur category probabilities $M^{c} = \frac{1}{1 + e^{(- f^{c})}}$ are obtained by applying an activation function (such as sigmoid() or softmax()) to $f^{c}$ , where $c \in {1, 2, 3, . . . C}$ and C is the number of different cardiac auscultation channels for the patient. By combining the cardiac murmur category probabilities predicted by all channels, the probability of the patient’s cardiac murmur category $\hat{ω} = φ (M^{c})$ is obtained.

In addition to diagnosing whether a patient has a cardiac murmur, cardiac murmur grading is a more critical task. The purpose of grading is to determine the severity and possible etiology of the cardiac murmur so that doctors can determine the risk of organic cardiac disease for the patient and further develop appropriate treatment plans to help track the development of the disease and feedback on the treatment effectiveness for timely adjustment and treatment. To perform cardiac murmur grading, the features $f^{c}$ of each channel are stacked by column to obtain a matrix $f = {f^{1}, \dots, f^{c}}$ with a shape of $(c \cdot h, w)$ , where each column vector contains the features of the cth channel. The patient’s cardiac murmur grading $\hat{G}$ is then obtained.

Cardiac risk analysis method based on heterogeneous multi-modal feature impact adaptation (HMA)

To address the challenge of long-term data dominance in heterogeneous multi-modal data, this paper proposes a method for cardiac disease risk analysis based on the adaptive influence of heterogeneous multi-modal features. The HMT module is used to construct structured representations of cardiac murmurs and grades from the unstructured modality and to transform the unstructured modality into the structured modality representation. An adaptive modality weighting learning mechanism is used to balance the multiple structured time-series monitoring modalities and the single structured modality, achieving a prediction of the risk of cardiac disease.

Specifically, the HMT module is used to transform the unstructured modality S into two embedded representations of structured cardiac murmur classification $\hat{ω}$ and grading $\hat{G}$ . Furthermore, these are concatenated with the structured modality composed of demographic information ${A, G, P, H, W, . . .}$ to form a feature vector $E$ , which is used as the input to the model.

To achieve an adaptive balance between the unstructured and structured modalities, this paper proposes a multi-modal importance evaluation method that can dynamically adjust the adaptive influence of different modalities. Assuming the model’s learnable parameter vector is $β$ , the cardiac disease prediction result $\hat{O}$ can be represented as

\begin{matrix} \hat{O} = s i g m o i d (E \cdot β) \end{matrix}

where the sigmoid activation function maps the result of $E \cdot β$ to a probability value between 0 and 1.

Assuming there are N patients, for each patient i, the model can obtain a predicted risk analysis value ${\hat{O}}_{i}$ . The cross-entropy loss function is used to measure the difference between the predicted value ${\hat{O}}_{i}$ and the actual value $O_{i}$ :

\begin{matrix} L = - \sum_{i = 1}^{n} [O_{i} log ({\hat{O}}_{i}) + (1 - O_{i}) log (1 - {\hat{O}}_{i})] \end{matrix}

where $O_{i}$ is the true risk assessment value of the i-th sample, and ${\hat{O}}_{i}$ is the predicted risk assessment value of the i-th sample.

To ensure the model’s generalization ability and prevent overfitting, an $L_{1}$ norm is added as a regularization term: $R (β) = λ \sum_{j = 1}^{m} | β_{j} |$ , where $λ$ is the regularization coefficient used to control the weight of the regularization term, and m is the number of modalities.

Combining the cross-entropy loss function and the regularization term, we obtain the final loss function:

\begin{matrix} L (β) = - \sum_{i = 1}^{n} [O_{i} log ({\hat{O}}_{i}) + (1 - O_{i}) log (1 - {\hat{O}}_{i})] + λ \sum_{j = 1}^{m} | β_{j} | \end{matrix}

The goal of training the model is to find a set of parameters $β$ that minimizes the loss function $L (β)$ .

Multi-task explainable learning module

In order to provide diagnostic evidence for the intelligent diagnosis of cardiac murmurs and their grading, a portion of the data in the feature $f^{c} = F (h^{c})$ of each channel c is masked to obtain the masked data $s^{' c}$ , which is then fed into the model for predicting cardiac murmurs to obtain the predicted result $M^{' c}$ . The prediction accuracy of the masked data is calculated using an error function (such as mean squared error, cross-entropy, etc.) to obtain the error value $e^{c}$ :

\begin{matrix} e^{c} = \frac{1}{n} \sum_{c = 1}^{C} {(M^{c} - M^{' c})}^{2} \end{matrix}

The mask is randomly moved to other locations, and the above steps are repeated to obtain the prediction error value $e_{i}^{c}$ for each sample point.

Using methods such as SHAP (SHapley Additive exPlanations) based on the prediction error values $e_{i}^{c}$ , the importance of each sample point for the diagnosis of cardiac murmurs and their grading can be calculated. By evaluating the importance of each sample point, the importance of each sample point in each channel can be obtained, thereby identifying data segments that are more important for diagnosis and providing more accurate diagnostic evidence and auxiliary decision support for cardiac murmurs and their grading.

To evaluate the influence of multi-modal data on the prediction results, the absolute value of the coefficient of each modality can be calculated, and then normalized using the L1 norm to obtain the influence magnitude of each modality:

\begin{matrix} I_{i} = \frac{| β_{i} |}{\sum_{j = 1}^{m} | β_{j} |} \end{matrix}

where $I_{i}$ represents the influence of the i-th modality, $| β_{i} |$ represents the absolute value of the coefficient of the i-th modality, and $\sum_{j = 1}^{m} | β_{j} |$ represents the sum of the absolute values of the coefficients of all modalities.

Finally, the contribution value of each modality to the diagnosis of cardiac disease can be obtained by multiplying the influence of each modality by the corresponding risk analysis data. Assuming the influence of the i-th modality is $I_{i}$ and the corresponding risk analysis result is $\hat{O}$ , the diagnosis of patients with cardiac disease for a single modality i can be expressed as:

\begin{matrix} C_{i} = I_{i} * \hat{O} \end{matrix}

The final prediction result of organic cardiac disease risk outcome can be equivalently expressed as the sum of the risk of organic cardiac disease for all modalities:

\begin{matrix} \hat{O} = \sum_{i = 1}^{R} C^{i} \end{matrix}

Experiment

Dataset

The dataset used for this study is a subset of a largerdataset, namely the CirCor DigiScope PCG dataset [43, 44]. The dataset was collected from two screening events conducted in the Northeast region of Brazil during the months of July and August in 2014, as well as June and July in 2015. The dataset comprises recordings of cardiac sounds, ranging in duration from 5 to 45 s, along with demographic information such as age, gender, height, weight, and pregnancy status.

In total, there are 1568 patients included in the dataset, with 942 patients assigned to the training set. For each patient, a maximum of six segments of cardiac sound recordings were available, resulting in a complete dataset containing 5272 recordings, with the training set consisting of 3163 recordings. Each recording originates from one of the following positions: pulmonary valve, aortic valve, mitral valve, tricuspid valve, or other. Furthermore, each patient is labeled with a cardiac murmur status (which could be categorized as present, unknown, or missing), a murmur grading (which is a grading of the murmur in the diastolic period according to the Levine scale), and a clinical outcome status (which could be categorized as pathological cardiopathy or non-pathological cardiopathy).

For the data preprocessing, first, we use a sliding window-based short-time Fourier transform to determine the frequency and phase components of the signal segment and extract the logarithmic Mel spectrogram [7]. This transform changes over time to obtain frequency and phase information and further maps the frequency axis to a logarithmic Mel scale to maintain the distance between human-perceived pitches [8, 9]. By calculating the logarithmic Mel spectrogram and extracting signal features from the recording, including time-domain features, frequency-domain features, spectral centroid, descent, bandwidth, etc. When computing the spectrogram, we divide each recording into overlapping segments, with each segment length of 4 s and a step size of 1 s. For each segment, we use a periodic Hanning window [54] with a window length of 25 milliseconds and a step size of 10 milliseconds to calculate a spectrogram of 64 coefficients using a fast Fourier transform. The minimum and maximum frequencies of the spectrogram are 10 Hz and 2000 Hz, respectively.

In addition, to process demographic data, we convert pregnancy status into a binary variable and combine it with age, gender, weight, and height to form a one-hot encoding.

Evaluations

In this experiment, we merge all the datasets after preprocessing, and then randomly split the whole dataset into 60% and 20% and 20% as the training set, validation set, and test set, respectively. The main parameters of the method were set to a learning rate of 0.00003, a sample training batch size of 200 samples, and 10 iterations.

In order to evaluate the performance of the method, we calculated the accuracy, recall, and other metrics of the model using the test data. The main metrics are as follows: Precision ( $P = \frac{TP}{T P + F P}$ ), Recall ( $R = \frac{TP}{T P + F N}$ ), F1 ( $F1 = \frac{(1 + β^{2}) \cdot P \cdot R}{β^{2} \cdot P + R}$ ), Accuracy ( $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ ), where TP (True Positive) denotes the number of samples that were actually positive cases that were predicted to be positive, TN (True Negative) denotes the number of samples that were actually negative cases that were predicted to be negative, FP (False Positive) denotes the number of samples that were actually negative cases that were predicted to be positive, FN (False Negative) denotes the number of samples that were actually positive cases that were predicted to be negative, and $β$ is the weight coefficient, which takes the value of 1.

Besides, we also use AUPRC (Area Under the Precision-Recall Curve ( $AUPRC = \int_{0}^{1} P (r) d R (r)$ ), where P denotes the precision rate, R denotes the recall rate, and r denotes the decision threshold of the classifier. And Weighted Accuracy ( $Weighted Accuracy = \frac{\sum_{i = 1}^{n} w_{i} \cdot A_{i}}{\sum_{i = 1}^{n} w_{i}}$ ), where $w_{i}$ denotes the weight of the i-th class of samples and $A_{i}$ denotes the accuracy of the i-th class of samples. Suppose there are n classes of samples, where the number of samples in the i-th class is $N_{i}$ , where $i \in [1, n]$ . Then the accuracy of the i-th class of samples is $A_{i} = \frac{{TP}_{i} + {TN}_{i}}{{TP}_{i} + {TN}_{i} + {FP}_{i} + {FN}_{i}}$ , where ${TP}_{i}$ , ${TN}_{i}$ , ${FP}_{i}$ , ${FN}_{i}$ , and ${FN}_{i}$ denote the number of True Positive, True Negative, False Positive, and False Negative for samples of the i-th class, respectively.

Model performance analysis

In the task of cardiac murmur detection, our method achieved the best performance among all methods, as demonstrated by comparing with relevant detection methods in the 2022 George B. Moody PhysioNet Challenge 2022 dataset (see Table 1). Our method achieved an AUPRC value of 77.8%, an F1 value of 76.5%, an accuracy of 88.4%, and a weighted accuracy of 78.3%, which respectively improved by 3.9% - 47.3%, 0.13% - 53.3%, 7.5% - 25.2%, and 0.9% - 23.8% compared to the state-of-the-art method for cardiac murmur detection [53]. Traditional feature-based methods require manual feature extraction before heart murmur detection, which can be severely affected by noise or interference such as breathing and baseline drift, and affect the model performance. This is also the main reason why such methods are rarely used nowadays. Compared with existing deep learning methods or combined methods, our method uses filtering, data augmentation, and multi-task learning techniques to increase data diversity and model generalization, thus improving model performance to a certain extent.

Table 1.

Model performance on the task of cardiac murmur detection

Categories	Methods	AUPRC	F1	Accuracy	Weighted accuracy
Feature-engineered Methods	Wavelet scattering+SVM [45]	0.684	0.686	0.778	0.771
Deep-learning-based methods	CRNN [46]	0.599	0.499	0.725	0.632
	Transformer [47]	0.569	0.571	0.769	0.69
	CNN +MFCC [48]	0.579	0.489	0.759	0.694
	Self-Supervised Learning+MLP [49]	0.656	0.597	0.706	0.737
	Resnet [50]	0.621	0.521	0.786	0.767
	RNN [34]	0.528	0.623	0.763	0.776
	CNN [51]	0.716	0.619	0.801	0.78
Combined methods	CRNN+SVM [40]	0.749	0.764	0.748	0.745
	ResNet+RF [52]	0.587	0.586	0.761	0.756
	CNN+RF [53]	0.561	0.647	0.822	0.776
	Ours	0.778	0.765	0.884	0.783

Open in a new tab

Bold values indicate the best results compared to the state-of-the-art methods

Meanwhile, to further address the challenge that existing methods can only perform simple detection of the presence of cardiac murmurs but cannot conduct scientific grading of cardiac murmurs, we used a hierarchical multi-task learning mechanism to establish a cardiac murmur grading module based on the Levine scale without affecting the accuracy of cardiac murmur detection. As shown in Fig. 4, the module achieved F1 value accuracy of 93.36%, 26.96%, 26.25%, and 74.62% on the NAN, I/VI, II/VI, and III/VI categories, respectively, with an average F1 value of 55.2975%, which can accurately grade cardiac murmurs according to the Levine scale. However, the model performed poorly in the I/VI and II/VI categories, which may be due to the fact that diseases often occur rarely, and the dataset for cardiac murmur monitoring has a natural skewness characteristic. In addition, the sample size of non-pathological cardiac murmurs (physiological or functional murmurs) is small, i.e., the sample size of the I/VI and II/VI categories is too small, which leads to a data skewness problem in the cardiac sound monitoring dataset, and the model is not fully trained on categories with few samples, making it easy for the model to confuse cardiac murmurs in the I/VI and II/VI categories with healthy individuals without murmurs, affecting the model performance in these two categories. However, for cardiac murmurs, murmurs in the I/VI and II/VI categories are often physiological or functional murmurs and have no practical clinical significance, so this confusion is acceptable, and our model can better recognize pathological cardiac murmurs in the III/VI patients.

Fig. 4 — Model performance on the task of cardiac murmur grading

In the task of identifying organic cardiac diseases, as shown in Fig. 5, the Precision (0.704), Recall (0.520), and f1-score (0.598) metrics for identifying organic cardiac diseases (i.e., the Abnormal category) are compared with the metrics (0.610, 0.774, 0.682) for identifying healthy individuals (i.e., the Normal category without organic cardiac diseases). The overall accuracy is slightly higher for non-organic patients. Overall, the performance of the Normal category is slightly better than that of the Abnormal category. This may be because the number of Normal type samples is relatively large, making it easier for the classifier to correctly identify the Normal type. However, overall, the model’s performance is similar in the organic and non-organic cardiac disease categories, with an overall f1-score of 0.64, and the model can accurately identify patients with organic cardiac diseases based on cardiac sounds.

Model parameter analysis

The sampling rate of audio may affect the model performance metrics in the cardiac murmur and disease risk analysis method based on heterogeneous modal adaptive multi-task learning. A higher sampling rate can provide more audio information and potentially improve the model’s classification performance, but it can also increase computational burden and model complexity. Figure 6 shows the possible impact of sampling rate on model performance metrics, i.e., the relationship between audio sampling rate and AUROC, AUPRC, F-measure, Accuracy, and Weighted Accuracy. It can be observed that Weighted Accuracy gradually increases with an increase in sampling rate, and the model’s performance slightly improves. However, the performance starts to decline after the sampling rate reaches 6000. This may be due to the fact that a high sampling rate may cause overfitting, increase the computational burden, and lead to poor generalization, thereby affecting the model’s accuracy.

Fig. 6 — Impact of audio sampling rate on performance metrics of a HMA model for cardiac sound analysis s

Multi-channel fusion is crucial for obtaining the best performance in monitoring cardiac murmurs since cardiac murmur auscultation often requires simultaneous monitoring on multiple auscultation channels. We investigated the effects of using three fusion methods, i.e., average pooling, maximum pooling, and minimum pooling, to fuse multiple auscultation channels, and compared their performance on metrics such as AUROC, AUPRC, F-measure, Accuracy, and Weighted Accuracy. According to Fig. 7, we can see that using average pooling often achieves better model performance ( $A U R O C_{mean} = 0.882$ , $A U P R C_{mean} = 0.778$ , $F - m e a s u r e_{mean} = 0.765$ , $A c c u r a c y_{mean} = 0.884$ , $W e i g h t e d A c c u r a c y_{mean} = 0.783$ ). This may be because using mean pooling can better utilize complementary information between different auscultation channels, thereby obtaining more accurate prediction results. In contrast, using maximum pooling or minimum pooling may result in some useful information being ignored or noise information being overly emphasized, thereby affecting model performance.

Fig. 7 — Impact of fusion methods on performance metrics in multi-channel integration

Model interpretability analysis

Due to the need to monitor minutes of cardiac audio data in cardiac sound monitoring tasks, doctors often find it difficult to accurately identify and locate disease information from long time-series data. To address this issue, our method automatically focuses on the diseased regions in cardiac audio data, based on cardiac murmur identification, murmur grading, and organic cardiac disease risk analysis, providing more accurate decision support for doctors. The proposed method can automatically capture abnormalities from long time-series data of audio monitoring, providing key information such as the location and features of cardiac murmurs and high-risk monitoring segments. For example, as shown in Fig. 8, the model can automatically focus on decrescendo murmurs and some other murmur segments, highlighting these high-risk areas in red. Compared with the blue normal cardiac sound segments, the first highlighted murmur segment in red represents the decrescendo murmur of aortic regurgitation, providing decision-making support for the diagnosis of cardiac diseases such as aortic regurgitation.

Fig. 8 — Automatic detection of cardiac murmurs for efficient and accurate diagnosis

In addition, we analyzed the abnormal segments automatically captured by the model. As shown in Fig. 9, the model can effectively capture murmur segment areas of typical mitral regurgitation (MR), aortic stenosis (AS) murmurs, etc., from a large amount of long-term monitoring data, providing doctors with rapid diagnostic decision-making support. For example, if the model automatically detects an AS murmur, it can not only help the doctor quickly focus on the possibility of aortic stenosis, but also make a timely diagnosis and formulate the best treatment strategy. If the model automatically detects an MR type of murmur segment, it can prompt the doctor to pay attention to the possibility of mitral regurgitation, take measures in time, and perform additional examinations such as echocardiography or cardiac catheterization to determine a treatment plan, preventing high-risk events such as mitral valve prolapse, rheumatic cardiac disease, or myocardial infarction-induced sudden death.

To further analyze the risk of organic cardiac disease, we constructed a cardiac murmur risk analysis nomogram [55], as shown in Fig. 10. The Point represents the risk score of organic cardiac disease corresponding to a single heterogeneous mode, with a data range of 1 to 100. The Overall point represents the sum of the disease risk corresponding to multiple heterogeneous modes, with a data range of 1 to 150.

Specifically, according to the model, the disease risk corresponding to each heterogeneous mode can be calculated as follows: $S_{p} = β_{i} \times P$ , $S_{m} = β_{i} \times w$ , $S_{g} = β_{i} \times G$ . Where P represents pregnancy status, w represents the probability of having a cardiac murmur, G represents the murmur grade, and $β_{i}$ represents the diagnostic influence of mode i on organic cardiac disease. By calculating the disease risk corresponding to each heterogeneous mode, the total disease risk of the patient can be obtained, thereby predicting the risk of cardiac disease for the patient:

\begin{matrix} R = S_{p} + S_{m} + S_{g} \end{matrix}

Suppose there is a patient who is not pregnant (P = -1), has a 0.7 probability of having a cardiac murmur (w = 0.7), and has a murmur grade of 3 (G = 3). Assuming that the corresponding $β$ coefficients are $β_{p} = - 22$ , $β_{m} = - 5$ , and $β_{g} = 33.3$ , we first calculate the disease risk score corresponding to each heterogeneous mode: $S_{p} = β_{p} \times P = - 1 \times - 22 = 22$ , $S_{m} = β_{m} \times w = 0.7 \times - 5 = - 3.5$ , $S_{g} = β_{g} \times G = 33.3 \times 3 = 100$ . Then, we add up these risk scores to obtain the total disease risk: $R = S_{p} + S_{m} + S_{g} = 22 - 3.5 + 100 = 118.5$ . Based on this calculation, it can be seen that the patient has a risk score of 118.5 for organic cardiac disease. According to the nomogram shown in Fig. 10, the threshold of organic cardiac disease probability is 0.78 and its corresponding threshold of overall points is 60. Therefore, this patient may have organic cardiac disease risk as his overall point is much larger than the threshold of overall points.

In other words, the nomogram can more intuitively display the risk of organic cardiac disease generated by different heterogeneous modes, and the disease risk of different heterogeneous modes can be added up to obtain the total disease risk of the patient, which helps doctors and patients better understand and predict the risk of disease.

Conclusion

This paper proposes a method for cardiac murmur grading and cardiac disease risk analysis based on heterogeneous modality adaptive multi-task learning. The method employs a HMT approach to detect and grade cardiac murmurs, effectively avoiding negative effects between the two tasks. Additionally, a HMA method is introduced to balance the representation ability of structured demographic modality data and unstructured cardiac sound sequence modality data, resulting in accurate analysis of cardiac disease risk. The utilization of a multi-task interpretable learning module enhances model interpretability. Experimental results on public datasets demonstrate the effectiveness of the proposed method in cardiac murmur detection, grading, and cardiac disease risk analysis, providing valuable diagnostic evidence for doctors and showcasing its potential for clinical application.

In the future, there are two main areas of improvement to focus on. Firstly, it is important to explore and integrate a wider range of data modalities beyond the structured demographic data and unstructured cardiac sound sequence data. By incorporating more diverse data sources such as ECG patterns or patient historical data, the model’s accuracy and predictive capabilities could be further enhanced. This would contribute to a more comprehensive and comprehensive risk analysis model. Secondly, the development of more interpretable models is crucial to provide transparent and easy-to-understand decision-making processes for doctors and patients, fostering trust and facilitating informed medical decision-making.

In conclusion, while the paper presents a robust and innovative approach to cardiac murmur detection and grading, integrating advanced machine learning techniques seamlessly, there are opportunities for further improvement. A more comprehensive evaluation and a broader discussion on real-world applicability would enhance the paper’s contributions. By addressing potential limitations and exploring the integration of diverse data modalities and interpretability, future research can advance the field of cardiac murmur detection and grading, leading to more accurate and clinically applicable risk analysis models for cardiovascular diseases.

Acknowledgements

The authors gratefully acknowledge the financial supports by the National Natural Science Foundation of China under Grant 62202332, Grant 62102008, and Diversified Investment Foundation of Tianjin under Grant 21JCQNJC00980, Grant 21JCQNJC01510.

Declarations

Conflict of interest

No potential conflict of interest.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Chenyang Xu and Xin Li have contributed equally to this work.

Contributor Information

Yuxi Zhou, Email: joy_yuxi@pku.edu.cn.

Shenda Hong, Email: hongshenda@pku.edu.cn.

References

1.Qiu D, Cheng Y, Wang X. Dual u-net residual networks for cardiac magnetic resonance images super-resolution. Comput Methods Programs Biomed. 2022;218:106707. 10.1016/j.cmpb.2022.106707. [DOI] [PubMed] [Google Scholar]
2.Weber C, Noels H. Atherosclerosis: current pathogenesis and therapeutic options. Nat Med. 2011;17(11):1410–22. [DOI] [PubMed] [Google Scholar]
3.Zhang D, Chen Y, Chen Y, Ye S, Cai W, Jiang J, Xu Y, Zheng G, Chen M. Heart disease prediction based on the embedded feature selection method and deep neural network. J Healthcare Eng. 2021;2021:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]
4.Shimizu I, Minamino T. Physiological and pathological cardiac hypertrophy. J Mol Cell Cardiol. 2016;97:245–62. [DOI] [PubMed] [Google Scholar]
5.Buja LM. Myocardial ischemia and reperfusion injury. Cardiovasc Pathol. 2005;14(4):170–5. [DOI] [PubMed] [Google Scholar]
6.Coffey S, Cairns BJ, Iung B. The modern epidemiology of heart valve disease. Heart. 2016;102(1):75–85. [DOI] [PubMed] [Google Scholar]
7.Wang F, Syeda-Mahmood T, Beymer D. Finding disease similarity by combining ecg with heart auscultation sound. In: 2007 Computers in Cardiology, 2007; pp. 261–264. IEEE
8.Reed TR, Reed NE, Fritzson P. Heart sound analysis for symptom detection and computer-aided diagnosis. Simul Model Pract Theory. 2004;12(2):129–46. [Google Scholar]
9.Randhawa SK, Singh M. Classification of heart sound signals using multi-modal features. Procedia Computer Science 2015; 58, 165–171. 10.1016/j.procs.2015.08.045 . Second International Symposium on Computer Vision and the Internet (VisionNet’15)
10.Mustafa M, Abdalla G, Manimurugan S, Alharbi AR. Detection of heartbeat sounds arrhythmia using automatic spectral methods and cardiac auscultatory. J Supercomput. 2020;76:5899–922. [Google Scholar]
11.Liu J, Wang H, Yang Z, Quan J, Liu L, Tian J. Deep learning-based computer-aided heart sound analysis in children with left-to-right shunt congenital heart disease. Int J Cardiol. 2022;348:58–64. 10.1016/j.ijcard.2021.12.012. [DOI] [PubMed] [Google Scholar]
12.Davidsen AH, Andersen S, Halvorsen PA, Schirmer H, Reierth E, Melbye H. Diagnostic accuracy of heart auscultation for detecting valve disease: a systematic review. BMJ Open. 2023;13(3): 068121. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Mangione S, Nieman LZ, Gracely E, Kaye D. The teaching and practice of cardiac auscultation during internal medicine and cardiology training: a nationwide survey. Ann Intern Med. 1993;119(1):47–54. [DOI] [PubMed] [Google Scholar]
14.Voigt I, Boeckmann M, Bruder O, Wolf A, Schmitz T, Wieneke H. A deep neural network using audio files for detection of aortic stenosis. Clin Cardiol. 2022;45(6):657–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Levin AD, Ragazzi A, Szot SL, Ning T. Extraction and assessment of diagnosis-relevant features for heart murmur classification. Methods. 2022;202:110–6. [DOI] [PubMed] [Google Scholar]
16.Biancaniello T. Innocent murmurs. Circulation. 2005;111(3):20–2. [DOI] [PubMed] [Google Scholar]
17.Elola A, Aramendi E, Oliveira J, Renna F, Coimbra MT, Reyna MA, Sameni R, Clifford GD, Rad AB. Beyond heart murmur detection: automatic murmur grading from phonocardiogram. IEEE J Biomed Health Inform. 2023. 10.1109/JBHI.2023.3275039. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L. Multi-task learning for dense prediction tasks: a survey. IEEE Trans Pattern Anal Mach Intell. 2021;44(7):3614–33. [DOI] [PubMed] [Google Scholar]
19.Hao Y, Usama M, Yang J, Hossain MS, Ghoneim A. Recurrent convolutional neural network based multimodal disease risk prediction. Futur Gener Comput Syst. 2019;92:76–83. [Google Scholar]
20.Safara F, Doraisamy S, Azman A, Jantan A, Ramaiah ARA. Multi-level basis selection of wavelet packet decomposition tree for heart sound classification. Comput Biol Med. 2013;43(10):1407–14. [DOI] [PubMed] [Google Scholar]
21.Debbal S, Bereksi-Reguig F. Computerized heart sounds analysis. Comput Biol Med. 2008;38(2):263–80. [DOI] [PubMed] [Google Scholar]
22.Rath A, Mishra D, Panda G, Pal M. Development and assessment of machine learning based heart disease detection using imbalanced heart sound signal. Biomed Signal Process Control. 2022;76: 103730. [Google Scholar]
23.Zeinali Y, Niaki STA. Heart sound classification using signal processing and machine learning algorithms. Mach Learn Appl. 2022;7: 100206. [Google Scholar]
24.Chen K, Mudvari A, Barrera FG, Cheng L, Ning T. Heart murmurs clustering using machine learning. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 94–98 (2018). IEEE
25.Delgado-Trejos E, Quiceno-Manrique A, Godino-Llorente J, Blanco-Velasco M, Castellanos-Dominguez G. Digital auscultation analysis for heart murmur detection. Ann Biomed Eng. 2009;37:337–53. [DOI] [PubMed] [Google Scholar]
26.Kotb MA, Nabih H, El Zahraa F, El Falaki M, Shaker CW, Refaey MA, Rjoob K. Improving the recognition of heart murmur. Int J Adv Comput Sci Appl. 2016;7(7):283–7. [Google Scholar]
27.Kotb MA, Elmahdy HN, Mostafa FEZ, Shaker CW, Refaey MA, Rjoob KWY. Recognition of heart murmur based on machine learning and visual based analysis of phonocardiography. In: Intelligent Computing: Proceedings of the 2018 Computing Conference, 2019; Volume 2, pp. 188–202. Springer
28.Xiao B, Xu Y, Bi X, Zhang J, Ma X. Heart sounds classification using a novel 1-d convolutional neural network with extremely low parameter consumption. Neurocomputing. 2020;392:153–9. [Google Scholar]
29.Patwa A, Rahman MMU, Al-Naffouri TY. Heart murmur and abnormal pcg detection via wavelet scattering transform & a 1d-cnn. arXiv preprint arXiv:2303.11423 (2023).
30.Oh SL, Jahmunah V, Ooi CP, Tan R-S, Ciaccio EJ, Yamakawa T, Tanabe M, Kobayashi M, Acharya UR. Classification of heart sound signals using a novel deep wavenet model. Comput Methods Programs Biomed. 2020;196: 105604. [DOI] [PubMed] [Google Scholar]
31.Venkataramani VV, Garg A, Priyakumar UD. Modified variable kernel length resnets for heart murmur detection and clinical outcome prediction using multi-positional phonocardiogram recording
32.Raza A, Mehmood A, Ullah S, Ahmad M, Choi GS, On B-W. Heartbeat sound signal classification using deep learning. Sensors. 2019;19(21):4819. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Li J, Ke L, Du Q, Ding X, Chen X. Research on the classification of ecg and pcg signals based on bilstm-googlenet-ds. Appl Sci. 2022;12(22):11762. [Google Scholar]
34.McDonald A, Gales MJ, Agarwal A. Detection of heart murmurs in phonocardiograms with parallel hidden semi-markov models. In: 2022 Computing in Cardiology (CinC), 2022 ; vol. 498, pp. 1–4. IEEE
35.Wang Z-H, Horng G-J, Hsu T-H, Aripriharta A, Jong G-J. Heart sound signal recovery based on time series signal prediction using a recurrent neural network in the long short-term memory model. J Supercomput. 2020;76:8373–90. [Google Scholar]
36.Freeman A, LEVINE SA. The clinical significance of the systolic murmur: a study of 1000 consecutive “non-cardiac" cases. Ann Intern Med. 1933;6(11):1371–85. [Google Scholar]
37.He Y, Li W, Zhang W, Zhang S, Pi X, Liu H. Research on segmentation and classification of heart sound signals based on deep learning. Appl Sci. 2021;11(2):651. [Google Scholar]
38.Bondareva E, Xia T, Han J, Mascolo C. Towards uncertainty-aware murmur detection in heart sounds via tandem learning. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
39.Araujo M, Zeng D, Palotti J, Xi X, Shi Y, Pyles L, Ni Q. Maiby’s algorithm: A two-stage deep learning approach for murmur detection in mel spectrograms for automatic auscultation of congenital heart disease. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
40.Gündüz AF, Fatih T. Pcg frame classification by classical machine learning methods using spectral features and mfcc based features. Avrupa Bilim ve Teknoloji Dergisi. 2022;42:77–82. [Google Scholar]
41.Chang Y, Liu L, Antonescu C. Multi-task prediction of murmur and outcome from heart sound recordings. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. 10.22489/CinC.2022.309
42.Keren R, Tereschuk M, Luan X. Evaluation of a novel method for grading heart murmur intensity. Arch Pediatr Adolesc Med. 2005;159(4):329–34. 10.1001/archpedi.159.4.329. [DOI] [PubMed] [Google Scholar]
43.Oliveira J, Renna F, Costa P, Nogueira M, Oliveira AC, Elola A, Ferreira C, Jorge A, Rad AB, Reyna M, et al. The circor digiscope phonocardiogram dataset. version 1.0. 0 (2022)
44.Oliveira J, Renna F, Costa PD, Nogueira M, Oliveira C, Ferreira C, Jorge A, Mattos S, Hatem T, Tavares T, Elola A, Rad AB, Sameni R, Clifford GD, Coimbra MT. The circor digiscope dataset: From murmur detection to murmur classification. IEEE J Biomed Health Inform. 2022;26(6):2524–35. 10.1109/JBHI.2021.3137048. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Cornely AK, Mirsky GM. Heart murmur detection using wavelet time scattering and support vector machines. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
46.Imran Z, Grooby E, Malgi VV, Sitaula C, Aryal S, Marzbanrad F. A fusion of handcrafted feature-based and deep learning classifiers for heart murmur detection. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
47.Kim J, Park G, Suh B. Classification of phonocardiogram recordings using vision transformer architecture. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. 10.22489/CinC.2022.084
48.Chang Y, Liu L, Antonescu C. Multi-task prediction of murmur and outcome from heart sound recordings. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
49.Ballas A, Papapanagiotou V, Delopoulos A, Diou C. Listen2yourheart: A self-supervised approach for detecting murmur in heart-beat sounds. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
50.Lee J, Kang T, Kim N, Han S, Won H, Gong W, Kwak I-Y. Deep learning based heart murmur detection using frequency-time domain features of heartbeat sounds. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
51.Lu H, Yip JB, Steigleder T, Grießhammer S, Heckel M, Jami NVSJ, Eskofier B, Ostgathe C, Koelpin A. A lightweight robust approach for automatic heart murmurs and clinical outcomes classification from phonocardiogram recordings. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
52.Bruoth E, Bugata P, Gajdoš D, Hudák D, Kmečová V, Staňková M, Szabari A, Vozáriková G, et al.. Murmur identification using supervised contrastive learning. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
53.Xu Y, Bao X, Lam H-K, Kamavuako EN. Hierarchical multi-scale convolutional network for murmurs detection on pcg signals. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE
54.Testa A, Gallo D, Langella R. On the processing of harmonics and interharmonics: using Hanning window in standard framework. IEEE Trans Power Deliv. 2004;19(1):28–34. 10.1109/TPWRD.2003.820437. [Google Scholar]
55.Hong H, Hong S. simplenomo: a python package of making nomograms for visualizable calculation of logistic regression models. Health Data Sci. 2023;3:0023. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR1] 1.Qiu D, Cheng Y, Wang X. Dual u-net residual networks for cardiac magnetic resonance images super-resolution. Comput Methods Programs Biomed. 2022;218:106707. 10.1016/j.cmpb.2022.106707. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Weber C, Noels H. Atherosclerosis: current pathogenesis and therapeutic options. Nat Med. 2011;17(11):1410–22. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Zhang D, Chen Y, Chen Y, Ye S, Cai W, Jiang J, Xu Y, Zheng G, Chen M. Heart disease prediction based on the embedded feature selection method and deep neural network. J Healthcare Eng. 2021;2021:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar] [Retracted]

[CR4] 4.Shimizu I, Minamino T. Physiological and pathological cardiac hypertrophy. J Mol Cell Cardiol. 2016;97:245–62. [DOI] [PubMed] [Google Scholar]

[CR5] 5.Buja LM. Myocardial ischemia and reperfusion injury. Cardiovasc Pathol. 2005;14(4):170–5. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Coffey S, Cairns BJ, Iung B. The modern epidemiology of heart valve disease. Heart. 2016;102(1):75–85. [DOI] [PubMed] [Google Scholar]

[CR7] 7.Wang F, Syeda-Mahmood T, Beymer D. Finding disease similarity by combining ecg with heart auscultation sound. In: 2007 Computers in Cardiology, 2007; pp. 261–264. IEEE

[CR8] 8.Reed TR, Reed NE, Fritzson P. Heart sound analysis for symptom detection and computer-aided diagnosis. Simul Model Pract Theory. 2004;12(2):129–46. [Google Scholar]

[CR9] 9.Randhawa SK, Singh M. Classification of heart sound signals using multi-modal features. Procedia Computer Science 2015; 58, 165–171. 10.1016/j.procs.2015.08.045 . Second International Symposium on Computer Vision and the Internet (VisionNet’15)

[CR10] 10.Mustafa M, Abdalla G, Manimurugan S, Alharbi AR. Detection of heartbeat sounds arrhythmia using automatic spectral methods and cardiac auscultatory. J Supercomput. 2020;76:5899–922. [Google Scholar]

[CR11] 11.Liu J, Wang H, Yang Z, Quan J, Liu L, Tian J. Deep learning-based computer-aided heart sound analysis in children with left-to-right shunt congenital heart disease. Int J Cardiol. 2022;348:58–64. 10.1016/j.ijcard.2021.12.012. [DOI] [PubMed] [Google Scholar]

[CR12] 12.Davidsen AH, Andersen S, Halvorsen PA, Schirmer H, Reierth E, Melbye H. Diagnostic accuracy of heart auscultation for detecting valve disease: a systematic review. BMJ Open. 2023;13(3): 068121. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Mangione S, Nieman LZ, Gracely E, Kaye D. The teaching and practice of cardiac auscultation during internal medicine and cardiology training: a nationwide survey. Ann Intern Med. 1993;119(1):47–54. [DOI] [PubMed] [Google Scholar]

[CR14] 14.Voigt I, Boeckmann M, Bruder O, Wolf A, Schmitz T, Wieneke H. A deep neural network using audio files for detection of aortic stenosis. Clin Cardiol. 2022;45(6):657–63. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Levin AD, Ragazzi A, Szot SL, Ning T. Extraction and assessment of diagnosis-relevant features for heart murmur classification. Methods. 2022;202:110–6. [DOI] [PubMed] [Google Scholar]

[CR16] 16.Biancaniello T. Innocent murmurs. Circulation. 2005;111(3):20–2. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Elola A, Aramendi E, Oliveira J, Renna F, Coimbra MT, Reyna MA, Sameni R, Clifford GD, Rad AB. Beyond heart murmur detection: automatic murmur grading from phonocardiogram. IEEE J Biomed Health Inform. 2023. 10.1109/JBHI.2023.3275039. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai D, Van Gool L. Multi-task learning for dense prediction tasks: a survey. IEEE Trans Pattern Anal Mach Intell. 2021;44(7):3614–33. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Hao Y, Usama M, Yang J, Hossain MS, Ghoneim A. Recurrent convolutional neural network based multimodal disease risk prediction. Futur Gener Comput Syst. 2019;92:76–83. [Google Scholar]

[CR20] 20.Safara F, Doraisamy S, Azman A, Jantan A, Ramaiah ARA. Multi-level basis selection of wavelet packet decomposition tree for heart sound classification. Comput Biol Med. 2013;43(10):1407–14. [DOI] [PubMed] [Google Scholar]

[CR21] 21.Debbal S, Bereksi-Reguig F. Computerized heart sounds analysis. Comput Biol Med. 2008;38(2):263–80. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Rath A, Mishra D, Panda G, Pal M. Development and assessment of machine learning based heart disease detection using imbalanced heart sound signal. Biomed Signal Process Control. 2022;76: 103730. [Google Scholar]

[CR23] 23.Zeinali Y, Niaki STA. Heart sound classification using signal processing and machine learning algorithms. Mach Learn Appl. 2022;7: 100206. [Google Scholar]

[CR24] 24.Chen K, Mudvari A, Barrera FG, Cheng L, Ning T. Heart murmurs clustering using machine learning. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 94–98 (2018). IEEE

[CR25] 25.Delgado-Trejos E, Quiceno-Manrique A, Godino-Llorente J, Blanco-Velasco M, Castellanos-Dominguez G. Digital auscultation analysis for heart murmur detection. Ann Biomed Eng. 2009;37:337–53. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Kotb MA, Nabih H, El Zahraa F, El Falaki M, Shaker CW, Refaey MA, Rjoob K. Improving the recognition of heart murmur. Int J Adv Comput Sci Appl. 2016;7(7):283–7. [Google Scholar]

[CR27] 27.Kotb MA, Elmahdy HN, Mostafa FEZ, Shaker CW, Refaey MA, Rjoob KWY. Recognition of heart murmur based on machine learning and visual based analysis of phonocardiography. In: Intelligent Computing: Proceedings of the 2018 Computing Conference, 2019; Volume 2, pp. 188–202. Springer

[CR28] 28.Xiao B, Xu Y, Bi X, Zhang J, Ma X. Heart sounds classification using a novel 1-d convolutional neural network with extremely low parameter consumption. Neurocomputing. 2020;392:153–9. [Google Scholar]

[CR29] 29.Patwa A, Rahman MMU, Al-Naffouri TY. Heart murmur and abnormal pcg detection via wavelet scattering transform & a 1d-cnn. arXiv preprint arXiv:2303.11423 (2023).

[CR30] 30.Oh SL, Jahmunah V, Ooi CP, Tan R-S, Ciaccio EJ, Yamakawa T, Tanabe M, Kobayashi M, Acharya UR. Classification of heart sound signals using a novel deep wavenet model. Comput Methods Programs Biomed. 2020;196: 105604. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Venkataramani VV, Garg A, Priyakumar UD. Modified variable kernel length resnets for heart murmur detection and clinical outcome prediction using multi-positional phonocardiogram recording

[CR32] 32.Raza A, Mehmood A, Ullah S, Ahmad M, Choi GS, On B-W. Heartbeat sound signal classification using deep learning. Sensors. 2019;19(21):4819. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Li J, Ke L, Du Q, Ding X, Chen X. Research on the classification of ecg and pcg signals based on bilstm-googlenet-ds. Appl Sci. 2022;12(22):11762. [Google Scholar]

[CR34] 34.McDonald A, Gales MJ, Agarwal A. Detection of heart murmurs in phonocardiograms with parallel hidden semi-markov models. In: 2022 Computing in Cardiology (CinC), 2022 ; vol. 498, pp. 1–4. IEEE

[CR35] 35.Wang Z-H, Horng G-J, Hsu T-H, Aripriharta A, Jong G-J. Heart sound signal recovery based on time series signal prediction using a recurrent neural network in the long short-term memory model. J Supercomput. 2020;76:8373–90. [Google Scholar]

[CR36] 36.Freeman A, LEVINE SA. The clinical significance of the systolic murmur: a study of 1000 consecutive “non-cardiac" cases. Ann Intern Med. 1933;6(11):1371–85. [Google Scholar]

[CR37] 37.He Y, Li W, Zhang W, Zhang S, Pi X, Liu H. Research on segmentation and classification of heart sound signals based on deep learning. Appl Sci. 2021;11(2):651. [Google Scholar]

[CR38] 38.Bondareva E, Xia T, Han J, Mascolo C. Towards uncertainty-aware murmur detection in heart sounds via tandem learning. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR39] 39.Araujo M, Zeng D, Palotti J, Xi X, Shi Y, Pyles L, Ni Q. Maiby’s algorithm: A two-stage deep learning approach for murmur detection in mel spectrograms for automatic auscultation of congenital heart disease. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR40] 40.Gündüz AF, Fatih T. Pcg frame classification by classical machine learning methods using spectral features and mfcc based features. Avrupa Bilim ve Teknoloji Dergisi. 2022;42:77–82. [Google Scholar]

[CR41] 41.Chang Y, Liu L, Antonescu C. Multi-task prediction of murmur and outcome from heart sound recordings. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. 10.22489/CinC.2022.309

[CR42] 42.Keren R, Tereschuk M, Luan X. Evaluation of a novel method for grading heart murmur intensity. Arch Pediatr Adolesc Med. 2005;159(4):329–34. 10.1001/archpedi.159.4.329. [DOI] [PubMed] [Google Scholar]

[CR43] 43.Oliveira J, Renna F, Costa P, Nogueira M, Oliveira AC, Elola A, Ferreira C, Jorge A, Rad AB, Reyna M, et al. The circor digiscope phonocardiogram dataset. version 1.0. 0 (2022)

[CR44] 44.Oliveira J, Renna F, Costa PD, Nogueira M, Oliveira C, Ferreira C, Jorge A, Mattos S, Hatem T, Tavares T, Elola A, Rad AB, Sameni R, Clifford GD, Coimbra MT. The circor digiscope dataset: From murmur detection to murmur classification. IEEE J Biomed Health Inform. 2022;26(6):2524–35. 10.1109/JBHI.2021.3137048. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Cornely AK, Mirsky GM. Heart murmur detection using wavelet time scattering and support vector machines. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR46] 46.Imran Z, Grooby E, Malgi VV, Sitaula C, Aryal S, Marzbanrad F. A fusion of handcrafted feature-based and deep learning classifiers for heart murmur detection. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR47] 47.Kim J, Park G, Suh B. Classification of phonocardiogram recordings using vision transformer architecture. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. 10.22489/CinC.2022.084

[CR48] 48.Chang Y, Liu L, Antonescu C. Multi-task prediction of murmur and outcome from heart sound recordings. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR49] 49.Ballas A, Papapanagiotou V, Delopoulos A, Diou C. Listen2yourheart: A self-supervised approach for detecting murmur in heart-beat sounds. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR50] 50.Lee J, Kang T, Kim N, Han S, Won H, Gong W, Kwak I-Y. Deep learning based heart murmur detection using frequency-time domain features of heartbeat sounds. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR51] 51.Lu H, Yip JB, Steigleder T, Grießhammer S, Heckel M, Jami NVSJ, Eskofier B, Ostgathe C, Koelpin A. A lightweight robust approach for automatic heart murmurs and clinical outcomes classification from phonocardiogram recordings. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR52] 52.Bruoth E, Bugata P, Gajdoš D, Hudák D, Kmečová V, Staňková M, Szabari A, Vozáriková G, et al.. Murmur identification using supervised contrastive learning. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR53] 53.Xu Y, Bao X, Lam H-K, Kamavuako EN. Hierarchical multi-scale convolutional network for murmurs detection on pcg signals. In: 2022 Computing in Cardiology (CinC), 2022; vol. 498, pp. 1–4. IEEE

[CR54] 54.Testa A, Gallo D, Langella R. On the processing of harmonics and interharmonics: using Hanning window in standard framework. IEEE Trans Power Deliv. 2004;19(1):28–34. 10.1109/TPWRD.2003.820437. [Google Scholar]

[CR55] 55.Hong H, Hong S. simplenomo: a python package of making nomograms for visualizable calculation of logistic regression models. Health Data Sci. 2023;3:0023. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Cardiac murmur grading and risk analysis of cardiac diseases based on adaptable heterogeneous-modality multi-task learning

Chenyang Xu

Xin Li

Xinyue Zhang

Ruilin Wu

Yuxi Zhou

Qinghao Zhao

Yong Zhang

Shijia Geng

Yue Gu

Shenda Hong

Abstract

Introduction

Fig. 1.

Related work

Feature engineering method

Deep learning method

Combined method

Method

Overview

Fig. 2.

Cardiac murmur detection and grading based on hierarchical multi-task learning (HMT)

Fig. 3.

Cardiac risk analysis method based on heterogeneous multi-modal feature impact adaptation (HMA)

Multi-task explainable learning module

Experiment

Dataset

Evaluations

Model performance analysis

Table 1.

Fig. 4.

Fig. 5.

Model parameter analysis

Fig. 6.

Fig. 7.

Model interpretability analysis

Fig. 8.

Fig. 9.

Fig. 10.

Conclusion

Acknowledgements

Declarations

Conflict of interest

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases