Abstract
Objective: Given the psychosocial and ethical burden, patients with hypertrophic cardiomyopathy (HCMs) could benefit from the establishment of genetic probability prior to the test. This study aimed to develop a simple tool to provide genotype prediction for HCMs.
Methods: A convolutional neural network (CNN) was built with the 12-lead electrocardiogram (ECG) of 124 HCMs who underwent genetic testing (GT), externally tested by predicting the genotype on another HCMs cohort (n = 54), and compared with the conventional methods (the Mayo and Toronto score). Using a third cohort of HCMs (n = 76), the role of the network in risk stratification was explored by calculating the sudden cardiac death (SCD) risk scorers (HCM risk-SCD) across the predicted genotypes. Score-CAM was employed to provide a visual explanation of the network.
Results: Overall, 80 of 178 HCMs (45%) were genotype-positive. Using the 12-lead ECG as input, the network showed an area under the curve (AUC) of 0.89 (95% CI, 0.83–0.96) on the test set, outperforming the Mayo score (0.69 [95% CI, 0.65–0.78], p < 0.001) and the Toronto score (0.69 [95% CI, 0.64–0.75], p < 0.001). The network classified the third cohort into two groups (predicted genotype-negative vs. predicted genotype-positive). Compared with the former, patients predicted genotype-positive had a significantly higher HCM risk-SCD (0.04 ± 0.03 vs. 0.03 ± 0.02, p <0.01). Visualization indicated that the prediction was heavily influenced by the limb lead.
Conclusions: The network demonstrated a promising ability in genotype prediction and risk assessment in HCM.
Keywords: Genotype, hypertrophic cardiomyopathy, electrocardiography, genetic testing, deep learning, convolutional neural network
Key messages
Patients with genotype-positive hypertrophic cardiomyopathy (HCM) have a higher risk of severe heart failure and sudden cardiac death (SCD). A deep learning-derived 12-lead electrocardiogram was developed, outperformed the conventional methods in genotype prediction, and showed a promising future in SCD risk assessment in HCM.
1. . Introduction
As a common heritable disease, hypertrophic cardiomyopathy (HCM) is predominantly caused by variants in genes encoding the sarcomere proteins and is clinically characterized by unexplained left ventricular hypertrophy (LVH). The presence of the sarcomere-related mutation in patients with HCM was associated with an increased risk of developing ventricular arrhythmia, severe myocardial dysfunction, and sudden cardiac death (SCD) [1], highlighting the importance of genetic testing (GT) in the clinical practice of HCM. Also, the GT result of being genotype-positive warrants a cascade screening for family members at risk, leading to a more precise diagnosis and risk stratification.
Meanwhile, challenges to GT have come to clinicians’ attention. Regular GT requires well-equipped laboratories and professionals with expertise in genetic counseling, which potentially prevents families in underdeveloped areas from accessing such services, while GT via more comprehensive approaches is substantially conducted in research settings instead of civil use. Also, despite the reduction of costs in general, the financial burden still weighs heavily on declining testing, attributed to the disparities in global healthcare insurance policies. Moreover, peripheral blood remains the preferred sample source, indicating the invasive nature of GT.
Alternatively, it could benefit patients to establish the genetic probability prior to the GT using a low-cost method for automatic identification of genotype-positive HCMs on a large scale and non-invasively, such as on electrocardiograms (ECG). Being a fast and widely-available tool, ECG retains a significant role in the clinical practice of HCM, and cardiac electrical abnormalities may be the only manifestation in the early stages of HCM. In some instances, a standard ECG could help differentiate HCM from phenocopies (Anderson-Fabry disease) while investigating unexplained LVH [2]. ECG-based features, like LVH by Sokolow-Lyon criteria, abnormal Q waves, and repolarization abnormalities, serve as independent predictors of HCM development in sarcomere protein mutation carriers [3]. Recently, as a novel approach, deep learning (DL) has been and practised at the genetic level of analysis, showing a promising future in tasks of diagnostics and risk assessment [4,5].
The DL and convolutional neural network (CNN) was designed to extract key features from structured arrays of data using the operation of convolution, which has been applied in several studies for automatic target identification [6,7]. With less advanced techniques, traditional architectures of artificial intelligence (AI) relied heavily on manually defined features. With a different pattern recognition process, CNN could learn features automatically, which is especially powerful when applied to image recognition tasks [8]. Another advance is the introduction of transfer learning (TL) to AI technology, which releases the framework from training from scratch and expands the CNN to tasks with limited data, especially for rare diseases. Generally, TL uses a pre-trained model as the foundation and then fine-tunes it on a smaller dataset for the new task. Specifically, the features extracted from the initial task serve as the starting point for a second task, facilitating quicker and more effective learning. A previous study has shown that pre-training fine-tuned on a small dataset improved the performance of CNN on the classification of atrial fibrillation [9]. Therefore, to achieve a standard ECG-based detection of sarcomeric mutation in HCM, this study proposed a pilot test using TL to acquire a pre-trained CNN and subsequently fine-tuned the network to distinguish genotype-positive (G+) from genotype-negative (G–) in HCM.
2. . Methods
2.1. . Study population
Patients aged ≥18 years who were diagnosed with HCM on an outpatient basis at the Qingchun Medical Center (QMC) and the Xiasha Medical Center (XMC) between 2019 and 2021 were enrolled in this study. The QMC and XMC are two independently operated tertiary medical centers. The QMC dataset (124 patients with HCM, consisting of 56 G+) was used to train a convolutional neural network (CNN) model, which was then externally tested on the XMC dataset (54 patients with HCM, consisting of 24 G+). To better confirm the generalizability and explore the role of the network in risk assessment, a third cohort (76 patients with HCM from QMC and XMC, genotype unknown) was created lately. As such, subjects in this study were collected from separate medical facilities at different times, highlighting the spatial & temporal diversity of the datasets in this study. Measuring by echocardiography, the diagnosis of HCM was confirmed by the presence of maximal wall thickness ≥15 mm in one or more left ventricular myocardial segments. Patients with HCM phenocopies such as Fabry disease and amyloidosis were excluded. The baseline characteristics were collected at enrollment, while the GT was completed within one week. The patients of the third cohort were enrolled under the same criteria aforementioned but without GT. The project was approved by the Research Ethics Committee and complies with the Declaration of Helsinki.
2.2. . Ground truth
Using the commercially available testing kit (Qiagen DNA Blood Midi/Mini Kit, Qiagen GmbH, Hilden, Germany), patients underwent genetic testing for HCM and had variants conventionally categorized as pathogenic or likely pathogenic regarding G+ [10–12]. Otherwise (i.e. variants classification of unknown significance, likely benign, or benign), they were labelled as G- [10–12].
2.3. . Acquisition of 12-lead ECG
The standard 10-second, 12-lead resting ECG was produced at a sampling rate of 500 Hz at enrollment. For patients who had multiple ECGs, the ECG closest to the time of GT was used for the analysis. For patients who scheduled an intervention (e.g. septal ablation, surgical myomectomy), the ECG acquired prior to the operation was analyzed.
2.4. . Convolutional neural network model
As a visual task, the ECG signal was pseudocyclical and composed of morphological features [13,14]. To exploit the pseudocyclical nature of the ECG signal for data augmentation, the signals were extracted from the original 12-lead ECG images in PNG format (847 × 685) using a similar method described elsewhere [15]. In short, after the image binarization and signal extraction, the ECG lines were cut into single-beat and grouped into 12 clusters, one for each lead. For the reconstruction, one single beat was randomly selected from each lead cluster, which formed a new 12-lead ECG compound for further analysis. As such, the network was trained on 10,260 reconstructed ECGs from the QMC dataset (i.e. training set) and externally tested on 6150 ECGs from the XMC dataset (i.e. testing set). A 5-fold cross-validation method was used in the training set to optimize the network, which was then tested by predicting the G + of the testing set. During the cross-validation, the split of the training set was made at the patient level to avoid data contamination.
The deep residual network (VGG-16) architecture had been proposed for similar ECG-related problems in the previous study with successful results [16]. Hence, the VGG-16 was pre-trained on the ImageNet dataset [17], then fine-tuned with the reconstructed ECGs of the QMC dataset, and tested on the ECGs of the XMC dataset. SoftMax was used as the activation function to calculate the category probabilities. The network output a dichotomous estimation corresponding to the probabilities of G + and G-, respectively. The network was optimized using the Adam optimizer and adopted binary cross-entropy loss as the loss function. A grid search was employed to optimize the hyperparameters of epoch (50, a lower loss function was not reached after these epochs), learning rate (10−4; 10−2, 10−3, 10−4, 10−5 were tested), and decay rate (10−4; 10−2, 10−3, 10−4, 10−5 were tested) [18]. The model with the highest validation accuracy was saved. Then, the receiver-operating characteristic (ROC) curve of the validation was used to determine the optimal probability threshold (maximum Youden index) for the binary classification. The optimal probability threshold was 9%, i.e. the test was considered positive for G + if the output was over 9%. The same threshold was applied to the external testing for the derivation of the model assessment. The predicted class of each patient was the label with the largest summed probability of the ECG segments.
2.5. . Reference models
In addition to the ECG-baseline network, there were two conventional scoring systems (the Mayo score [19] and the Toronto score [11] employed in this study as reference. The Mayo score was calculated by assigning −1 point for the presence of hypertension and 1 point for each of the following variables: age at diagnosis ≤45 years, reverse curvature septal morphology, maximal LV wall thickness ≥20 mm, family history of HCM, and sudden cardiac death (SCD). The Toronto score was calculated using the following variables: age at diagnosis, sex, septal morphology, presence of hypertension, family history of HCM, and the ratio of the maximal LV wall thickness to posterior wall thickness.
2.6. . Network applied to risk stratification
As prior studies indicated, compared to those with G-, HCM patients with G + were reported to have a greater lifetime risk of ventricular tachycardia, heart failure, and SCD [10,20–22]. Since the network was designed to predict the genotype, it was expected to be applied to the risk stratification (e.g. the prediction of G + corresponding to a high risk of SCD in HCM).
For SCD risk assessment, the HCM risk-SCD score, recommended by the 2014 European Society of Cardiology (ESC) guidelines [23], was employed to provide a five-year SCD risk estimation for HCMs of the third cohort. The HCM risk-SCD score was a probability estimation of the five-year SCD risk, which was calculated using an algorithm [24] involving variables of age, family history of SCD, syncope, left ventricular outflow tract (LVOT) gradient, maximum LV wall thickness, left atrial dimension, and non-sustained ventricular tachycardia on Holter monitoring. The network was deployed to read the ECGs of the third cohort and classified the patients into two groups (i.e. predicted G + vs. predicted G-). The inter-group comparison in the HCM risk-SCD score was performed.
2.7. . Visualization
To make the network more transparent, the score-CAM was employed as a class-discriminative localization technique to produce a visual explanation [25]. The score-CAM approach obtains the weight of each activation map through its forward passing score on the target class [25]. Using the combination of weights and activation maps, this study presented a representative visualization to reveal the ECG features which were applied to develop the predictive model.
2.8. . Statistical analysis
Normally distributed continuous variables were presented as mean ± standard deviation while non-normal distributed ones were presented as median [interquartile range]. For comparisons of the characteristics between G + and G–, Student’s t-test, Chi-squared test, or Fisher’s exact test, was used, as appropriate. The area under the receiver-operating-characteristic curves (AUCs) of the ECG-only network was compared to those generated by the Mayo score, the Toronto score, and the ECG-baseline network using the non-parametric receiver operating characteristic estimation and the net reclassification improvement (NRI) index. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were also calculated with true positive (TP), false positive (FP), true negative (TN), and false negative (FN). The statistically significant was defined as a p-value of <0.05. The statistical analyses were performed using python (version 3.6) and R software (version 3.7.0).
3. . Results
There were 178 HCM patients who underwent GT in the QMC and XMC datasets. Overall, 80 (45%) patients were G+. Patients who were G + were younger, of severe cardiac dysfunction, tended to have a family history of SCD, but were unlikely to have a comorbidity of hypertension (Table 1). There was no significant difference in maximal LV wall thickness, the proportion of obstructive HCM, or LVEF between G + and G–.
Table 1.
Baseline characteristics of patients by the enrollment.
| Overall |
QMC dataset (Training-set) |
XMC dataset (Testing-set) |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| G− (n = 98) |
G+ (n = 80) |
p-value | G− (n = 68) |
G+ (n = 56) |
p-value | G− (n = 30) |
G+ (n = 24) |
p-value | |
| Demography | |||||||||
| Age | 59.48 ± 13.25 | 52.86 ± 15.07 | <0.01 | 59.07 ± 13.10 | 53.45 ± 14.88 | 0.03 | 60.40 ± 13.78 | 51.50 ± 15.75 | 0.03 |
| Male | 75 (76.53) | 47 (58.75) | 0.01 | 51 (75.00) | 33 (58.93) | 0.06 | 24 (80.00) | 14 (58.33) | 0.08 |
| Medical history | |||||||||
| Hypertension | 58 (59.18) | 26 (32.50) | <0.01 | 35 (51.47) | 18 (32.14) | 0.03 | 23 (76.67) | 8 (33.33) | <0.01 |
| Family history of HCM | 13 (13.27) | 30 (37.50) | <0.01 | 9 (13.24) | 19 (33.93) | 0.01 | 4 (13.33) | 11 (45.83) | 0.01 |
| Family history of SCD | 4 (4.08) | 14 (17.50) | <0.01 | 3 (4.41) | 9 (16.07) | 0.04 | 1 (3.33) | 5 (20.83) | 0.08 |
| Clinical | |||||||||
| NYHA | 0.02 | 0.02 | 1.00 | ||||||
| Class I | 93 (94.90) | 68 (85.00) | 65 (95.59) | 46 (82.14) | 28 (93.33) | 22 (91.67) | |||
| Class II | 5 (5.10) | 7 (8.75) | 3 (4.41) | 5 (8.93) | 2 (6.67) | 2 (8.33) | |||
| Class III | 0 (0.00) | 5 (6.25) | 0 (0.00) | 5 (8.93) | 0 (0.00) | 0 (0.00) | |||
| Class IV | 0 (0.00) | 0 (0.00) | 0 (0.00) | 0 (0.00) | 0 (0.00) | 0 (0.00) | |||
| Echocardiography | |||||||||
| LVEF | 69.03 ± 7.54 | 66.43 ± 10.61 | 0.06 | 69.44 ± 7.67 | 65.39 ± 10.97 | 0.02 | 68.10 ± 7.26 | 68.86 ± 9.49 | 0.74 |
| Maximal left ventricular wall thickness | 22.38 ± 7.58 | 23.54 ± 7.71 | 0.32 | 21.98 ± 7.31 | 23.28 ± 7.47 | 0.33 | 23.30 ± 8.22 | 24.15 ± 8.38 | 0.71 |
| Left ventricular outflow tract obstruction | 10 (10.20) | 12 (15.00) | 0.33 | 7 (10.29) | 8 (14.29) | 0.50 | 3 (10.00) | 4 (16.67) | 0.69 |
| Ratio of maximal wall thickness to posterior wall thickness | 0.01 | 0.01 | 0.10 | ||||||
| ≤1.46 | 22 (22.45) | 21 (26.25) | 11 (16.18) | 17 (30.36) | 11 (36.67) | 4 (16.67) | |||
| 1.47–1.70 | 28 (28.57) | 10 (12.50) | 21 (30.88) | 8 (14.29) | 7 (23.33) | 2 (8.33) | |||
| 1.71–1.92 | 18 (18.37) | 9 (11.25) | 16 (23.53) | 5 (8.93) | 2 (6.67) | 4 (16.67) | |||
| 1.93–2.26 | 16 (16.33) | 15 (18.75) | 11 (16.18) | 11 (19.64) | 5 (16.67) | 4 (16.67) | |||
| ≥2.27 | 14 (14.29) | 25 (31.25) | 9 (13.24) | 15 (26.79) | 5 (16.67) | 10 (41.67) | |||
| Genotype | |||||||||
| MYBPC3 | 33 (41.25) | 22 (39.29) | 11 (45.83) | ||||||
| MYH7 | 26 (32.50) | 18 (32.14) | 8 (33.33) | ||||||
| TNNT2 | 7 (8.75) | 6 (10.71) | 1 (4.17) | ||||||
| MYL2 | 4 (5.00) | 3 (5.36) | 1 (4.17) | ||||||
| MYL3 | 2 (2.50) | 1 (1.79) | 1 (4.17) | ||||||
| Multiple | 4 (5.00) | 2 (3.57) | 2 (8.33) | ||||||
| Other | 4 (5.00) | 4 (7.14) | 0 (0.0) | ||||||
G–: genotype negativity; G+: genotype positivity; HCM: hypertrophic cardiomyopathy; SCD: sudden cardiac death; LVEF: left ventricular ejection fraction; LVH: left ventricular hypertrophy; NYHA: New York Heart Association.
While testing on the ECGs that were not used in the training phase, the network outperformed the reference (Mayo and Toronto score) by demonstrating a more accurate prediction of G + with an AUROC of 0.89 (95% CI, 0.83–0.96, p < 0.01; Table 2). Significant net reclassification improvement was achieved by the network over the reference, indicating that more patients were reclassified correctly by the network. Compared with the reference, the network had higher sensitivity 0.84 (95% CI, 0.81–0.96), a positive predictive value 0.87 (0.74–0.94), and a negative predictive value 0.80 (95% CI 0.66–0.91).
Table 2.
Performance in genotype prediction between the Mayo score, Toronto score, and the network.
| Prediction model | AUC | P-value | NRI | P-value | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|---|---|---|
| Mayo score (reference) | 0.69(0.65–0.78) | Reference | Reference | Reference | 0.65(0.54–0.74) | 0.75(0.68–0.85) | 0.66(0.61–0.8) | 0.72(0.67–0.79) |
| Toronto score (reference) | 0.69(0.64–0.75) | Reference | Reference | Reference | 0.6(0.52–0.67) | 0.79(0.7–0.84) | 0.7(0.56–0.8) | 0.7(0.67–0.75) |
| The network | 0.89(0.83–0.96) | <0.01* | 0.72(0.49–0.96) | <0.01‡ | 0.84(0.81–0.96) | 0.78(0.64–0.95) | 0.87(0.74–0.94) | 0.8(0.66–0.91) |
| <0.01† | 0.81(0.56–1.07) | <0.01§ |
AUC: area under the receiver operating characteristics curve; NRI: net reclassification improvement; NPV: negative predictive value; PPV: positive predictive value.
*P value was calculated to compare AUC of the Mayo score with that of the network.
†P value was calculated to compare AUC of the Toronto score with that of the network.
‡Continuous NRI and associated p-values were displayed for the Mayo score versus the network.
§Continuous NRI and associated p-values were displayed for the Toronto score versus the network.
The third cohort (including 76 HCMs, mean age: 55.2 ± 13.9, 24 female and 52 male) was created to assess the possible role of the network in risk stratification (Table 3). By reading the ECGs, the network identified 25 patients as predicted G + while the rest 51 were labelled as predicted G–. The HCM risk-SCD score across the predicted genotypes was shown in Figure 1, indicating a significantly higher HCM risk-SCD score in patients predicted G+ (0.04 ± 0.03 vs. 0.03 ± 0.02, p <0.01). To provide a visual explanation of the network, Score-CAM was performed and the result was displayed in Figure 2. This weight-based method indicated that the network output was heavily influenced by the limb leads (I, II, and avR).
Table 3.
Baseline characteristics of the third cohort by the predicted genotypes.
| Overall |
|||
|---|---|---|---|
| Predicted G- (n = 51) |
Predicted G+ (n = 25) |
p-value | |
| Demography | |||
| Age | 56.02 ± 16.24 | 53.60 ± 7.16 | 0.48 |
| Male | 35 (68.63) | 17 (68.00) | 0.96 |
| Medical history | |||
| Hypertension | 26 (50.98) | 7 (28.00) | 0.06 |
| Family history of HCM | 2 (3.92) | 9 (36.00) | <0.01 |
| Family history of SCD | 6 (11.76) | 7 (28.00) | 0.08 |
| Syncope | 7 (13.73) | 6 (24.00) | 0.26 |
| Clinical | |||
| NYHA | 0.88 | ||
| Class I | 19 (37.25) | 9 (36.00) | |
| Class II | 22 (43.14) | 12 (48.00) | |
| Class III | 9 (17.65) | 3 (12.00) | |
| Class IV | 1 (1.96) | 1 (4.00) | |
| Non-sustained VT | 7 (13.73) | 11 (44.00) | <0.01 |
| Echocardiography | |||
| LVEF | 70.29 ± 6.70 | 69.83 ± 7.73 | 0.79 |
| Maximal left ventricular wall thickness | 20.85 ± 4.17 | 21.95 ± 4.16 | 0.30 |
| Left atrial dimension | 42.72 ± 6.83 | 42.71 ± 7.15 | 0.99 |
| LVOT gradient | 10.24 (38.43) | 9.00 (20.68) | 0.82 |
G–: genotype negativity; G+: genotype positivity; HCM: hypertrophic cardiomyopathy; SCD: sudden cardiac death; LVEF: left ventricular ejection fraction; NYHA: New York Heart Association; LVOT: left ventricular outflow tract; VT: ventricular tachycardia. LVOT: gradient was presented as median (interquartile range).
Figure 1.
Comparison in the HCM risk-SCD score between the predicted genotypes (predicted genotype-positive vs. predicted genotype-negative). Higher HCM risk-SCD score denotes an increased risk of SCD. The upper, lower bounds, and the line intersection refer to the 25th, 75th percentiles, and the median. G-, genotype-negative; G+, genotype-positive; HCM, hypertrophic cardiomyopathy; SCD, sudden cardiac death.
Figure 2.
Spatial display of features extracted by the network. The key features (red) were distributed on limb-lead.
4. . Discussion
In this study, a TL-derived 12-lead ECG was developed to predict the genotype and assess the risk of SCD in HCM, which highlighted the simplicity and feasibility of our approach while achieving a better performance than conventional methods (i.e. the practice of our method only needs a standard ECG, no other information was required).
It is noteworthy that the interpretation of the GT results is not always straightforward but rather complicated [26], which requires proper genetic counseling before and after testing. Despite the rapid growth during the past 30 years, access to genetic counseling is still often limited in many countries [27]. Significant disparities in the healthcare insurance coverage of GT were found amongst world regions [28], causing a major financial burden and barriers to access to GT [29]. For individuals at-risk, this limited access to GT in turn draws an overwhelming psychological impact that arises from uncertainty around subsequent consequences. As a low-cost and widely-available tool, ECG could act as an adjunctive means to identify genetic probability, minimizing psychological harm.
Our network provided a more accurate prediction than conventional methods. The conventional score systems for genotype prediction were rather complex and relied heavily on echocardiography assessment, which was subject to inter- and intra- variability [30]. Also, patients suffering from “occult” or “masked” hypertension may not be aware of the true state of their blood pressure, which led to a wrongful estimation via score systems (e.g. the scoring term of “the presence of hypertension”). Similarly, it was not uncommon for patients to have no knowledge or only partial information about their family history, which was not enough to support an estimation with confidence. The moderate performance of conventional methods may be attributed to this missing or misleading information. A standard 12-lead ECG, on the other hand, is a common and less operator-dependent test [31]. Studies had shown that ECG-derived parameters correlated with the results of GT. The absence of negative T waves and the ECG criteria of LVH were more frequently observed in G+ [32,33]. However, these ECG markers are predefined measurements, preventing any subtle genotype-related ECG patterns from being discovered. Moreover, these ECG parameters ought to be utilized along with other G + risk factors. As such, in this study, instead of confining itself to classic ECG criteria or predefined measurements (e.g. PR interval, QRS duration), CNN was used to automatically extract features that may not be apparent to the human eye or by conventional automated algorithms. Additionally, the sarcomere variants were associated with intermediate risk, posing a greater risk for adverse outcomes in G + compared to G- [10]. The assumption that patients predicted as G + by our network would have a higher level of risk score for SCD was confirmed in this study, demonstrating potential in the candidates’ selection for ICD implantation.
The Score-CAM indicated that the features frequently used to distinguish between HCM patients with G + and G- were spatially distributed in limb-lead. This finding is consistent with prior studies. The presence of a negative T-wave in lead I was strongly correlated with sarcomeric mutations [32]. Additionally, an interesting inference from this study was presented, showing that lead avR contributed to the genotype prediction. Given that a prior study reported a higher frequency of G + with right ventricular involvement [34], our finding suggested that lead avR may serve as a potential candidate for the G + marker in the HCM population.
Study limitations
Several limitations should be considered while interpreting this study. First, we recognized that a small sample size of a homogenous population inevitably suffered from selection bias and prevented generalization. Nevertheless, efforts had been made to increase generalizability and reduce the over-fitting. Cross-validation was implemented in this study. Moreover, the trained network was tested well on unseen ECGs which were collected from a separate medical center. Second, the G + was defined within the scope of current knowledge. It is possible that the classification of the identified variants could change over time. Nevertheless, in the context of current clinical practice, the guideline recommends that family screening and SCD risk assessment proceed with the support of GT [35]. Therefore, our work contributes to clinical practice by providing a widely available solution for targeting a specific population. Third, due to the limited sample size, we used the dichotomous classification, i.e. G + vs. G- as the outcome set of this study, leaving specific mutations (e.g. MYBPC3, MYH7, et al.) uninvolved. As an advantage of the deep learning approach, our network will be optimized to identify specific sarcomere mutations as the sample size grows. Finally, it would be more convincing to assess the network’s performance in risk stratification by tracking the whereabouts of each patient in long-term follow-up. However, due to the limited resources, it was less likely to keep close monitoring of each individual and capture all the outcome events. Therefore, our network was compared with the well-known SCD risk assessments. These risk assessments, validated by several studies [36–38], were deemed reliable and recommended by the ESC and ACC/AHA guidelines as the standard protocol for risk stratification in HCM.
5. . Conclusions
This study developed a predictive tool to establish the genotype of HCM based on a standard 12-lead ECG rather than relying on complex information and subjective echocardiographic assessment. Compared to conventional methods, the network demonstrated superior performance in genotype prediction and was able to provide an assessment of the risk of SCD.
Acknowledgment
Many thanks to Xiang Zhou and Yaxun Sun for their help.
Funding Statement
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Author contributions
Conception and design of study: Laite Chen, Guosheng Fu, Chenyang Jiang; acquisition of data: Laite Chen; analysis and/or interpretation of data: Laite Chen; Drafting the manuscript: Laite Chen, Guosheng Fu, Chenyang Jiang; All authors approved the final manuscript and agree to be accountable for all aspects of the work.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Data availability statement
The data will be shared on reasonable request with the corresponding author.
References
- 1.Maron BJ, Maron MS.. Hypertrophic cardiomyopathy. Lancet. 2013;381(9862):1–9. doi: 10.1016/S0140-6736(12)60397-3. [DOI] [PubMed] [Google Scholar]
- 2.Vitale G, Ditaranto R, Graziani F, et al. Standard ECG for differential diagnosis between Anderson-Fabry disease and hypertrophic cardiomyopathy. Heart. 2022;108(1):54–60. doi: 10.1136/heartjnl-2020-318271. [DOI] [PubMed] [Google Scholar]
- 3.Lorenzini M, Norrish G, Field E, et al. Penetrance of hypertrophic cardiomyopathy in sarcomere protein mutation carriers. J Am Coll Cardiol. 2020;76(5):550–559. doi: 10.1016/j.jacc.2020.06.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fujinami-Yokokawa Y, Ninomiya H, Liu X, et al. Prediction of causative genes in inherited retinal disorder from fundus photography and autofluorescence imaging using deep learning techniques. Br J Ophthalmol. 2021;105(9):1272–1279. doi: 10.1136/bjophthalmol-2020-318544. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Clapp MA, McCoy TH.. The potential of big data for obstetrics discovery. Curr Opin Endocrinol Diabetes Obes. 2021;28(6):553–557. doi: 10.1097/MED.0000000000000679. [DOI] [PubMed] [Google Scholar]
- 6.Kleppe A, Skrede OJ, De Raedt S, et al. Designing deep learning studies in cancer diagnostics. Nat Rev Cancer. 2021;21(3):199–211. doi: 10.1038/s41568-020-00327-9. [DOI] [PubMed] [Google Scholar]
- 7.Routhier E, Mozziconacci J.. Genomics enters the deep learning era. PeerJ. 2022;10:e13613. doi: 10.7717/peerj.13613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rostami B, Anisuzzaman DM, Wang C, et al. Multiclass wound image classification using an ensemble deep CNN-based classifier. Comput Biol Med. 2021;134:104536. doi: 10.1016/j.compbiomed.2021.104536. [DOI] [PubMed] [Google Scholar]
- 9.Weimann K, Conrad TOF.. Transfer learning for ECG classification. Sci Rep. 2021;11(1):5251. doi: 10.1038/s41598-021-84374-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ho CY, Day SM, Ashley EA, et al. Genotype and lifetime burden of disease in hypertrophic cardiomyopathy: insights from the sarcomeric human cardiomyopathy registry (SHaRe). Circulation. 2018;138(14):1387–1398. doi: 10.1161/CIRCULATIONAHA.117.033200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Gruner C, Ivanov J, Care M, et al. Toronto hypertrophic cardiomyopathy genotype score for prediction of a positive genotype in hypertrophic cardiomyopathy. Circ Cardiovasc Genet. 2013;6(1):19–26. doi: 10.1161/CIRCGENETICS.112.963363. [DOI] [PubMed] [Google Scholar]
- 12.Bos JM, Will ML, Gersh BJ, et al. Characterization of a phenotype-based genetic test prediction score for unrelated patients with hypertrophic cardiomyopathy. Mayo Clin Proc. 2014;89(6):727–737. doi: 10.1016/j.mayocp.2014.01.025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Sugrue A, Noseworthy PA, Kremen V, et al. Identification of concealed and manifest long QT syndrome using a novel T wave analysis program. Circ: arrhythmia and Electrophysiology. 2016;9(7):e003830. doi: 10.1161/CIRCEP.115.003830. [DOI] [PubMed] [Google Scholar]
- 14.Gholam-Hosseini H, Nazeran H.. Detection and extraction of the ECG signal parameters Proceedings of the 20th Annual International Conference of the IEEE Engineering in Medicine and Biology Society Vol20 Biomedical Engineering Towards the Year 2000 and Beyond (Cat No98CH36286) 1998:p. 127–130. vol.1. doi: [Google Scholar]
- 15.Chiou YA, Syu JY, Wu SY, et al. Electrocardiogram lead selection for intelligent screening of patients with systolic heart failure. Sci Rep. 2021;11(1):1948. doi: 10.1038/s41598-021-81374-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Irmak E. COVID-19 disease diagnosis from paper-based ECG trace image data using a novel convolutional neural network model. Phys Eng Sci Med. 2022;45(1):167–179. doi: 10.1007/s13246-022-01102-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Deng J, Dong W, Socher R, et al. ImageNet: a large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009:248–255. doi: [Google Scholar]
- 18.Bergstra J, Bengio Y.. Random search for Hyper-Parameter optimization. Journal of Machine Learning Research. 2012;13(1):281–305. doi: [Google Scholar]
- 19.Bonaventura J, Norambuena P, Tomasov P, et al. The utility of the Mayo score for predicting the yield of genetic testing in patients with hypertrophic cardiomyopathy. Arch Med Sci. 2019;15(3):641–649. doi: 10.5114/aoms.2018.78767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Nguyen MB, Mital S, Mertens L, et al. Pediatric hypertrophic cardiomyopathy: exploring the genotype-phenotype association. J Am Heart Assoc. 2022;11(5):e024220. doi: 10.1161/JAHA.121.024220. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Georgiopoulos G, Figliozzi S, Pateras K, et al. Comparison of demographic, clinical, biochemical, and imaging findings in hypertrophic cardiomyopathy prognosis: a network meta-analysis. JACC Heart Fail. 2023;11(1):30–41. doi: 10.1016/j.jchf.2022.08.022. [DOI] [PubMed] [Google Scholar]
- 22.Miron A, Lafreniere-Roula M, Steve Fan CP, et al. A validated model for sudden cardiac death risk prediction in pediatric hypertrophic cardiomyopathy. Circulation. 2020;142(3):217–229. doi: 10.1161/CIRCULATIONAHA.120.047235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Elliott PM, Anastasakis A, Borger MA,et al. 2014 ESC guidelines on diagnosis and management of hypertrophic cardiomyopathy: the task force for the diagnosis and management of hypertrophic cardiomyopathy of the European society of cardiology (ESC). Eur Heart J. 2014;35(39):2733–2779. doi: 10.1093/eurheartj/ehu284. [DOI] [PubMed] [Google Scholar]
- 24.O’Mahony C, Jichi F, Pavlou M, et al. A novel clinical risk prediction model for sudden cardiac death in hypertrophic cardiomyopathy (HCM risk-SCD). Eur Heart J. 2014;35(30):2010–2020. doi: 10.1093/eurheartj/eht439. [DOI] [PubMed] [Google Scholar]
- 25.Wang H, Wang Z, Du M, et al. Score-CAM: score-weighted visual explanations for convolutional neural networks. 2019. [Google Scholar]
- 26.De Backer J, Bondue A, Budts W, et al. Genetic counselling and testing in adults with congenital heart disease: a consensus document of the ESC working group of Grown-Up congenital heart disease, the ESC working group on aorta and peripheral vascular disease and the european society of human genetics. Eur J Prev Cardiol. 2020;27(13):1423–1435. doi: 10.1177/2047487319854552. [DOI] [PubMed] [Google Scholar]
- 27.Abacan M, Alsubaie L, Barlow-Stewart K, et al. The global state of the genetic counseling profession. Eur J Hum Genet. 2019;27(2):183–197. doi: 10.1038/s41431-018-0252-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Gatto EM, Walker RH, Gonzalez C, et al. Worldwide barriers to genetic testing for movement disorders. Eur J Neurol. 2021;28(6):1901–1909. doi: 10.1111/ene.14826. [DOI] [PubMed] [Google Scholar]
- 29.Farmer MB, Bonadies DC, Pederson HJ, et al. Challenges and errors in genetic testing: the fifth case series. Cancer J. 2021;27(6):417–422. doi: 10.1097/PPO.0000000000000553. [DOI] [PubMed] [Google Scholar]
- 30.Morbach C, Gelbrich G, Breunig M, et al. Impact of acquisition and interpretation on total inter-observer variability in echocardiography: results from the quality assurance program of the STAAB cohort study. Int J Cardiovasc Imaging. 2018;34(7):1057–1065. doi: 10.1007/s10554-018-1315-3. [DOI] [PubMed] [Google Scholar]
- 31.Chang A, Cadaret LM, Liu K.. Machine learning in electrocardiography and echocardiography: technological advances in clinical cardiology. Curr Cardiol Rep. 2020;22(12):161. doi: 10.1007/s11886-020-01416-9. [DOI] [PubMed] [Google Scholar]
- 32.Robyns T, Breckpot J, Nuyens D, et al. Clinical and ECG variables to predict the outcome of genetic testing in hypertrophic cardiomyopathy. Eur J Med Genet. 2020;63(3):103754. doi: 10.1016/j.ejmg.2019.103754. [DOI] [PubMed] [Google Scholar]
- 33.Lopes LR, Brito D, Belo A, et al. Genetic characterization and genotype-phenotype associations in a large cohort of patients with hypertrophic cardiomyopathy – an ancillary study of the Portuguese registry of hypertrophic cardiomyopathy. Int J Cardiol. 2019;278:173–179. doi: 10.1016/j.ijcard.2018.12.012. [DOI] [PubMed] [Google Scholar]
- 34.Zhang Y, Zhu Y, Zhang M, et al. Implications of structural right ventricular involvement in patients with hypertrophic cardiomyopathy. European Heart Journal Quality of Care & Clinical Outcomes. 2022; 9(1):34–41. doi: 10.1093/ehjqcco/qcac008. [DOI] [PubMed] [Google Scholar]
- 35.Ommen SR, Mital S, Burke MA, et al. 2020 AHA/ACC guideline for the diagnosis and treatment of patients with hypertrophic cardiomyopathy: a report of the American college of cardiology/American heart association joint committee on clinical practice guidelines. Circulation. 2020;142(25):e558–e631. doi: 10.1161/CIR.0000000000000937. [DOI] [PubMed] [Google Scholar]
- 36.Liebregts M, Faber L, Jensen MK, et al. Validation of the HCM risk-SCD model in patients with hypertrophic cardiomyopathy following alcohol septal ablation. Europace. 2018;20(FI2):f198–f203. doi: 10.1093/europace/eux251. [DOI] [PubMed] [Google Scholar]
- 37.Choi YJ, Kim HK, Lee SC, et al. Validation of the hypertrophic cardiomyopathy risk-sudden cardiac death calculator in asians. Heart. 2019;105(24):1892–1897. doi: 10.1136/heartjnl-2019-315160. [DOI] [PubMed] [Google Scholar]
- 38.Zegkos T, Tziomalos G, Parcharidou D, et al. Validation of the new American college of cardiology/American heart association guidelines for the risk stratification of sudden cardiac death in a large mediterranean cohort with hypertrophic cardiomyopathy. Hellenic J Cardiol. 2022;63:15–21. doi: 10.1016/j.hjc.2021.06.005. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The data will be shared on reasonable request with the corresponding author.


