Skip to main content
PLOS One logoLink to PLOS One
. 2023 Feb 2;18(2):e0280438. doi: 10.1371/journal.pone.0280438

Deep learning-based diagnosis of feline hypertrophic cardiomyopathy

Jinhyung Rho 1,2, Sung-Min Shin 3, Kyoungsun Jhang 4, Gwanghee Lee 4, Keun-Ho Song 5, Hyunguk Shin 6, Kiwon Na 7, Hyo-Jung Kwon 5, Hwa-Young Son 5,*
Editor: Jude Hemanth8
PMCID: PMC9894403  PMID: 36730319

Abstract

Feline hypertrophic cardiomyopathy (HCM) is a common heart disease affecting 10–15% of all cats. Cats with HCM exhibit breathing difficulties, lethargy, and heart murmur; furthermore, feline HCM can also result in sudden death. Among various methods and indices, radiography and ultrasound are the gold standards in the diagnosis of feline HCM. However, only 75% accuracy has been achieved using radiography alone. Therefore, we trained five residual architectures (ResNet50V2, ResNet152, InceptionResNetV2, MobileNetV2, and Xception) using 231 ventrodorsal radiographic images of cats (143 HCM and 88 normal) and investigated the optimal architecture for diagnosing feline HCM through radiography. To ensure the generalizability of the data, the x-ray images were obtained from 5 independent institutions. In addition, 42 images were used in the test. The test data were divided into two; 22 radiographic images were used in prediction analysis and 20 radiographic images of cats were used in the evaluation of the peeking phenomenon and the voting strategy. As a result, all models showed > 90% accuracy; Resnet50V2: 95.45%; Resnet152: 95.45; InceptionResNetV2: 95.45%; MobileNetV2: 95.45% and Xception: 95.45. In addition, two voting strategies were applied to the five CNN models; softmax and majority voting. As a result, the softmax voting strategy achieved 95% accuracy in combined test data. Our findings demonstrate that an automated deep-learning system using a residual architecture can assist veterinary radiologists in screening HCM.

Introduction

Feline hypertrophic cardiomyopathy (HCM) is a common, chronic, and life-threatening heart disorder that affects 14.7% of cats aged > 7 months [1]. Unlike humans, the most frequent, predominant cardiac disease is the HCM. In 306 primary cardiac disorders from 1998 to 2005, 252 cases were cardiomyopathy (82%) and 48 cases (16%) were congenital heart disease [2, 3]. It is also among the ten most common causes of death in cats [4]. Feline HCM causes heart enlargement, with prominent ventricular wall hypertrophy and interventricular septum. Consequently, the left ventricular cavity is narrowed and the left atrium is dilated, in the heart of the affected cats [5]. Clinical signs of HCM include hyperventilation, syncope, and arterial thromboembolism [1]. Moreover, 25% of cats aged > 9 years are asymptomatic [4]. However, the most ominous sign of HCM is sudden death, which may occur within seconds and without any notable symptoms [6].

Various diagnostic tools for HCM include physical examination, electrocardiography, ultrasound examination, X-ray imaging, and blood analysis [7]. Initially, incidental findings on physical examination showing murmur, gallop sound, or arrhythmia can be considered. Additionally, evaluating the levels of N-terminal-pro brain-type natriuretic peptide (NT-proBNP) may be performed; however, radiology and echocardiography are considered the gold standard for diagnosing HCM. Specifically, a radiographic thoracic ventrodorsal (VD) image of a cat with feline HCM shows a specific cardiac silhouette known as the valentine heart. Several methods for radiographical measurement of heart enlargement are prevalent, including vertebral heart size, modified vertebral heart size, and cardiothoracic ratio. Although quantitative, VHS shows only 51% accuracy in feline radiography, indicating that many examinations should be combined to confirm the HCM [8].

Machine learning (ML) is a dominant paradigm in radiologic and histological analyses. By combining computer science and statistics, ML allows self-learning and improves performance by learning from experience. Deep learning (DL) is an ML class involving learning through neural networks with multiple representation levels corresponding to different abstraction levels [9]. Owing to its high accuracy, there are various approaches regarding the application of DL in veterinary diagnostics, including cytology [10], fecal parasite detection [11], histopathology [12], and radiology [13]. Tommaso et al. attempted to classify canine radiographic findings on thoracic lateral (LL) images and observed a sensitivity > 90% [11]. However, few studies have investigated DL-based feline radiographic findings analyses. Here, we investigated the DL-based classification of feline HCM in 275 thoracic VD radiographic images from Chungnam National University Veterinary Hospital and local veterinary hospitals using five deep neural networks, with validation through diagnosis by a radiology specialist, and to determine the most optimal deep neural network model.

Materials and methods

Dataset

Image quality, proper positioning, and exposure were considered by a veterinary radiology specialist, and only fine images were selected. For the generalizability of the dataset, we obtained 273 feline thoracic VD X-ray images obtained from independent 5 institutions; Chungnam National University Veterinary Medical Teaching Hospital and four local animal hospitals. The institutions providing radiographic images are enlisted in Table 1. For personal reasons, the name of the institution is sealed. All radiographic images were obtained during a routine examination or follow-up. Owner consent was obtained to use these images for the study. Only images and disease states (normal or HCM) were provided to the researcher. All images were anonymized and diagnosed by a veterinary radiology specialist and approved for research use. Among the 273 radiographic images, 164 and 109 were diagnosed as HCM and normal, respectively. 143 and 88 HCM and normal images were used for the learning process. Further, 21 images determined as HCM and 21 images of normal were sorted into test images for evaluation of models. In order to avoid the peeking phenomenon, we used 11 normal and 11 HCM images to analyze the accuracy of the model. Then, 20 images were additionally added to evaluate the peeking phenomenon and the voting strategy. S1 Table presents detailed information.

Table 1. The information of the dataset used in the experiment.

Animal Hospital
Case
A B C D E (revision) Total
Normal 32 30 27 10 10 109
HCM 55 50 34 15 10 164

Image mask production

Various open-source libraries were used for image processing and the learning process. A library operating system (OS) was imported to process the directory and files. Images were processed into arrays using NumPy and Matplotlib. TensorFlow, Keras, OpenCV, Pillow, and Scikit images were imported to deal with the deep neural network model and model. After learning, we used an open-source Pyplot to plot the graphs and confusion metrics. Fig 1 shows the detailed process of the image analysis. The image mask comprised one channel constituting 0 and 1 per pixel. Specifically, the heart was indicated by 1 while the other pixels were expressed as 0, which allowed the computer to recognize the heart’s location. To maximize the accuracy, Unet was implemented for image mask production. Unet is a network architecture used for precise image segmentation. First, X-ray images were resized to 256 × 256 and converted into arrays with three numbers expressing red, green, and blue intensities in each pixel (three channels). The shape of the image array was expressed as height, width, and channels, while the status of the converted image was expressed as (256, 256, 3). To train the Unet, previously marked masks were prepared and processed together. Consequently, an image mask was produced, indicating the heart location (256, 256, 1). Mask images were further processed using a Binary threshold and Otsu threshold to enhance accuracy.

Fig 1. Schematic illustration of the learning process.

Fig 1

Pre-processing procedures

X-ray images for training were resized to 233 × 233 and converted to arrays (233, 233, 3), followed by pre-processing to enhance accuracy. Regarding pre-processing, we implemented Contrast-stretch and Overlay using a previously produced image mask. Contrast stretching extends a specific contrast range to enhance image distinction. Overlay with an image mask indicating the heart area improved heart detection and diagnosis. During the overlay process, the image mask was resized to 233 × 233 to fit the original images.

Data augmentation

The pre-processed images were amplified by data augmentation, which is a valuable tool to increase the diversity of a training set by applying random transformations. It is useful when there is insufficient data or a low detection rate. In our dataset, the original training dataset images were randomly rotated in 5-degree intervals, sheared in the range of 0.2, and zoomed in the range of 0.2, respectively, to obtain the final training dataset images, which are 2958 images of normal and 2860 images of HCM.

Image classification

After pre-processing, five deep neural network models (Resnet50V2, Resnet152, InceptionResnetV2, MobilenetV2, and Xception) were compared to investigate the optimal engine for HCM diagnosis from feline VD X-ray images. Application of the deep neural network model was performed using Opensource library TensorFlow and Keras. The learning rate was set to start from 0.000001 and increase linearly to 0.001 until the 13 epoch. From the 14th epoch, the learning rate decreased by an exponential decay. The repetition of learning was set to 45 times, i.e., epochs, and designed to stop whenever there was an improvement in the loss function, which prevented the overfitting of the learning. The learning process of five architectures began from imagenet pretrained weight. In each pre-trained model without a classifier, we built additional classifier layers constituting a flattening layer and four fully-connected dense layers. Between the dense layers, we used a dropout probability of 0.5 to prevent overfitting. After the learning process, we analyzed the accuracy and loss graphs; additionally, the model was tested using 22 X-ray images (11 normal and 11 HCM) not used in the learning process. Based on the test results, we drew a confusion matrix and receiver operating characteristic (ROC) curve, followed by a comparison of the accuracy for each deep neural network model. The weight h5 file and Colab files are included in supplementary files.

Metrics to compare neural network architectures

Based on the learning process of five neural networks, we analyzed various factors including accuracy, precision, recall, F1 score, sensitivity, specificity, and area under the curve (AUC) score. The definitions for the comparison metrics are as follows:

Accuracy = Number of correct predictions/total number of predictions.

Precision = True positive/(true positive + false positive)

Recall (sensitivity) = True positive/(true positive + false negative)

Specificity = True negative/(true negative + false positive)

The F1 score represents the harmonic mean of precision and recall. The AUC score refers to the two-dimensional area underneath the entire receiver operating characteristic (ROC) curve.

Ensemble strategy

Based on the request for the revision, we applied an ensemble strategy to enhance the model accuracy, prevent peeking, and minimize the misdiagnosis arising from a single architecture. We employed two strategies in the ensemble: majority voting and softmax voting [14]. Each model prediction, softmax output, is converted to 0 (HCM) or 1 (NORMAL) using the Argmax function. Majority voting is performed based on the sum of the odd number of binary prediction outputs. If the number of outputs is 5 and the sum is greater than or equal to 3, the diagnosis is NORMAL; otherwise, the diagnosis is HCM. In contrast, softmax voting adds any number of softmax outputs that are the two weight values for NORMAL and HCM. Then, the Argmax function is applied to the sum of softmax outputs to decide which is more significant between NORMAL weight sum and HCM weight sum. The diagnosis is performed using a larger value of summed weights in the softmax strategy.

t-distributed stochastic neighbor embedding (t-SNE) visualization

The CNN model has high-dimensional feature data that cannot be visualized and plotted in a three-dimensional world. To visualize how the model distinguishes radiographic image data, t-distributed stochastic neighbor embedding was applied to the test datasets whose corresponding feature data in the model are converted and visualized into 2-d feature points. The TSNE in the Scikit Learn library was implied to the test data and plotted in Fig 7. The feature data used in t-SNE is gained before passing the flattening layer of each model.

Fig 7. t-SNE plotting of five deep learning models in all test data.

Fig 7

Hardware and software

For modeling and coding, Anaconda3 and Jupyter Notebook were operated on a workstation with an AMD Ryzen 9 3900x 12core processor, 64 GB RAM, and Geforce RTX 3080.

Results

Learning process analysis

Fig 2 presents the results of the learning process. Epochs refer to cycles of repeated learning. The number of cycles positively and negatively correlated with the training accuracy and loss rate, respectively. There was a gradual increase in the accuracy of all training models, as evident from the accuracy and loss graphs. However, regarding the validation accuracy of the ResNet50V2 model, an irregular peak in the 10th epoch was observed. Further, the validation accuracy of the ResNet152 showed three and two irregular peaks in the accuracy and loss graphs, respectively. The validation accuracy of the inceptionResNetV2 model showed two and three irregular peaks in the accuracy and loss graphs, respectively. Moreover, MobilenetV2 demonstrated two peaks in the accuracy and loss graphs, while Xception rarely showed irregular peaks in the graphs.

Fig 2. Accuracy and loss graphs after training the architectures to detect feline HCM using VD X-ray images.

Fig 2

Five DL architectures were used: ResNet50V2, ResNet152, InceptionResNetV2, MobileNetV2, and Xception. Epochs refer to the repetition of learning.

Model test and evaluation

After the learning process, each model predicted the diagnosis of the previously sorted images. Figs 3 and 4 show the prediction results. The MobilenetV2 showed 90.91% of accuracy. The Resnet50V2, Resnet152, InceptionResNetV2, and Xception showed 95.45% accuracy respectively. Although the same accuracy, the misdiagnosed image of each model differed from each other (Figs 2 and 3) in 22 test images, except the Resnet50V2 and Resnet152 (Figs 2 and 3).

Fig 3. Confusion matrix from the deep learning process of detecting feline HCM using VD X-ray images.

Fig 3

Five DL architectures were used, with ResNet50V2, ResNet152, and InceptionResNetV2 presented. 22 X-ray images were tested. X-ray images on the right indicate original images that the architecture could not classify.

Fig 4. Confusion matrix obtained from the deep learning process of detecting feline HCM using VD X-ray images (ResNet50V2, ResNet152, and InceptionResNetV2).

Fig 4

Cont’d. Twenty-two X-ray images were tested. X-ray images on the right indicate original images that the architecture could not classify.

To analyze model performance, the ROC curve was analyzed (Fig 5), which shows the performance of the classification model at all classification thresholds. We calculated the accuracy, precision, recall, F1 score, and area under curve (AUC) score (Table 2). ResNet50V2 showed 95.45% accuracy, 100% precision, 91% recall, 95% F1 score, and 75.2% AUC score. ResNet152 showed 95.45% accuracy, 100% precision, 91% recall, 95% F1 score, and 80.57% AUC score. InceptionResNetV2 showed 95.45% accuracy, 100% precision, 91% recall, 95% F1 score, and 87.6% AUC score. MobileNetV2 showed 91.91% accuracy, 100% precision, 82% recall, 90% F1 score, and 87.6% AUC score. Finally, Xception showed 95.45% accuracy, 92% precision, 100% recall, 96% F1 score, and 91.63% AUC score.

Fig 5. ROC curve of DL results in detecting feline HCM using VD X-ray images.

Fig 5

Five DL architectures were used and compared (ResNet50V2, ResNet152, InceptionResNetV2, MobileNetV2, and Xception). The AUC was calculated and presented.

Table 2. Evaluating parameters of each deep neural network model (%)—accuracy, precision, recall, F1 score, AUC score, sensitivity, and specificity.

Architecture Accuracy Precision Recall (sensitivity) F1 Score AUC score Specificity
ResNet50V2 95.45 100 91 95 75.20 100
ResNet152 95.45 100 91 95 80.57 100
InceptionResNetV2 95.45 100 91 95 87.60 100
MobileNetV2 90.91 100 82 90 87.60 100
Xception 95.45 92 100 96 91.73 90.91

Testing on additional data

One of the biggest concerns in Deep learning is “peeking.” Peeking is a severe problem in which the learning process influences the test data, consequently showing high accuracy in test data but erroneous outcomes in the unseen data. Peeking is inevitable, yet we tried to avoid it.

To evaluate peeking in the study, we obtained normal and HCM-affected radiograph images (10 each) from an uninvolved vet hospital. We evaluated the model from the new data and compared it to the previous data (Fig 6). The test result of accuracy in new data showed 75% in Resnet50V2, 70% in Resnet152, 85% in InceptionResNetV2, 55% in MobilenetV2, and 80% in Xception respectively. The combined results were 86% in Resnet50V2, 83% in Resnet152, 90% in InceptionResNetV2, 76% in MobilenetV2, and 86% in reception respectively.

Fig 6. Comparison of accuracy of five DL models in old test data, new data, and combined results.

Fig 6

Five DL architectures were used and compared (ResNet50V2, ResNet152, InceptionResNetV2, MobileNetV2, and Xception).

To evaluate whether the model distinguishes the test dataset, we plotted the t-SNE of each model in Fig 7. As a result, the plotted test data in InceptionResnetv2, Resnet152, and Xception showed a distinct separation between normal and HCM. The accuracy and loss results are enlisted in S2 Table.

Ensemble strategy

To enhance the accuracy, minimize the misdiagnosis of a single model, and prevent peeking, we ensembled the results from each model in two ways: Majority voting and Softmax. We first tested in all test sets, including old and new data (Fig 8A). Interestingly, majority voting and softmax voting achieved 100% accuracy in old data. In the new data, however, the accuracy of majority voting achieved only 85%, whereas softmax voting achieved 90%. In combined results, the accuracy of majority voting achieved 93% and softmax 95%, respectively. The confusion matrix of the softmax voting strategy on combined test data is plotted in Fig 8B.

Fig 8. Comparison of the accuracy of Majority voting strategy and Softmax strategy in old data, new data, and combined results.

Fig 8

(A) and Confusion matrix of Softmax voting strategy and misdiagnosed images (B).

Discussion

Recently, there has been a tremendous advancement in visual recognition in ML owing to the implementation of neuron architecture similar to the response of a neuron in the visual cortex, called a convolutional neural network (CNN) [15]. The Recent CNN architecture comprises the optimized setting of the convolutional structure. Although many architectures are available, we trained five residual learning frameworks (Resnet50V2, ResNet152, InceptionResNetV2, MobileNetV2, and Xception) using 231 images of feline VD X-ray images and determined the most optimal engine for diagnosing feline HCM. The accuracy of all engines was > 90%, with Xception being the ideal architecture.

ResNet152 (deep residual networks) comprises 152 depth layers but has lower complexity [16]. The ResNet, developed by Microsoft, is deeper than VGG nets, developed in 2014 by the Visual Geometry Group at Oxford University. Resnet152 has achieved excellent recognition tasks in ImageNet and Ms. COCO competitions. Specifically, ResNet152 has markedly lowered the top five errors encountered compared with the previous neural network. In our study, ResNet demonstrated > 90% accuracy; however, it showed several fluctuating patterns during the learning phase, as illustrated in the accuracy and loss graphs (Fig 2). ResNet50V2 is a 50-layered deep residual network developed by the inventors of ResNet 152. It has a two-layer block in a 34-layer net, with a three-layer bottleneck block.

InceptionResNetV2 was incepted in conjunction with ResNet [17]. It introduced residual connection with traditional architecture and demonstrated high performance in the 2015 ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Inception is a convolutional network that was introduced by Google in the 2014 ILSVRC. It was subsequently modified into version 4 (Inception V4). InceptionResNetV2 is a combination of InceptionV4 and ResNet. By conjugating both architectures and the correction process, there was a significant error reduction of 4.9%. In our study, InceptionResNetV2 showed 95.45% accuracy in the test; however, it fluctuated in the validation accuracy and validation loss graph.

MobileNet was designed in 2017 by Google for mobile environments; the architecture is focused on lightening the weight and enhancing efficiency [18]. To achieve efficiency, depthwise separable convolution was applied, which uses a single filter for each input channel, reducing computation and model size. MobileNet uses linear bottlenecks, which have lower information loss and computation than ReLU. In our study, MobileNet showed an accuracy of 0.91.

Xception was introduced in 2017 by Google. Inspired by Inception, Xception applied Francois Chollet-modified depthwise separable convolution to independently calculate cross-channel correlations and spatial correlation, termed extreme inception [19]. Xception is composed of 14 modules and 36 convolutional layers. Specifically, the absence of a nonlinearity activation function increases the accuracy, including ReLU and ELU. In our study, Xception showed the smoothest accuracy, loss graphs, and the highest ROC curve. Based on the results, the most optimal architecture for diagnosing feline HCM was deduced to be Xception.

With the increase in computer capacity and processing ability, there has been extensive research on the application of DL in various veterinary diagnostic fields. Hattel et al. developed a two-staged algorithm that distinguished normal findings from several pathological diagnoses in the bovine liver, lung, spleen, and kidney stained using hematoxylin and eosin (H&E) [20]. The algorithm was based on a support vector machine (SVM), among the algorithms used in Ml through classification and regression analysis. Several rat studies have focused on evaluating H&E-stained bone marrow. Kozlowski et al. developed an automated algorithm for quantifying bone marrow cell density [12]. They suggested that automated measures of bone marrow cellular depletion were more accurate than manual scoring by pathologists. This algorithm was extended to quantify the myeloid, erythroid, lymphoid, and megakaryocyte cell density. With the improvement in digital pathology, Kartasalo et al. investigated the three-dimensional reconstruction of H&E-stained slides [21]. Specifically, they reconstructed the prostate and liver by annotating 2448 landmarks in the slides. In addition to histology, the Vetscan algorithm detects feline fecal parasites, including Toxocara, Trichuris, Ancylostoma, and taeniid eggs [11]. Additionally, hemosiderophages in equine bronchoalveolar lavage fluid were quantified based on a DL algorithm [10]. Regarding veterinary radiography, Banzato et al. developed an algorithm using DenseNet 121 and ResNet50 to detect radiographic findings in canine lateral (LL) X-ray images [13]. The findings were classified as cardiomegaly, alveolar, bronchial, interstitial, mass, pleural effusion, pneumothorax, and megaesophagus. Although some findings showed an accuracy of < 70%, some datasets detected cardiomegaly with a 98% accuracy.

Feline HCM is a multifactorial disease with various diagnostic protocols, including physical examination, genetic testing, NT-proBNP and troponin-I levels, blood pressure measurement, radiography, and ultrasound examination [7]. Recently, numerous biomarkers have been explored to diagnose feline HCM, including AIA, APOM, CPN1, prothrombin, etc. [22]. The gold standard for diagnosing HCM in cats is radiography and ultrasound analysis. Radiography of cats can identify HCM through symptoms, including cardiomegaly, auricular bulge, and silhouette changes [7]. Thoracic radiography allows the identification of mild or moderate cardiac changes in cardiomyopathy. For objective diagnosis of cardiomegaly in dogs and cats, radiologists implement a vertebral heart scale (VHS) [23]. By calculating the relative ratio of the long and short axis of the heart to vertebral size, a radiologist can determine whether the heart is enlarged. In addition, a modified VHS method enables the measurement of left atrial size in the lateral view of thoracic radiography. However, the combination of these indexes yielded an accuracy of only 75% [8]; therefore, various diagnostic protocols are recommended for confirming feline HCM. In our study, the diagnostic accuracy of thoracic radiography for feline HCM was 95%. Although sole dependence on thoracic radiography to diagnose feline HCM is risky, the model can suggest the possibility of feline HCM with at least 95% accuracy.

Another importance of the detection of HCM in a radiographic image is that it could save cost and increase the detection rate during the routine examination. As the only specialized veterinarian can perform the echocardiogram examination and high-cost equipment is required, an x-ray and auscultation are the most frequent tool to detect the HCM of the cat in the local animal hospital. By advising the vet whether the cat could be affected HCM by radiographic images, local vets can consider whether the animal should be transferred to the specialist.

This study has some limitations including the limited number of samples and the error rate. Because the subject of the study is specified into feline cardiomyopathy and the ventro-dorsal (VD) radiographic images, comparatively low number of the datasets were available. However, we overcame the limited number of datasets by implementing the data augmentation and the voting strategy. Further investigation is required to obtain the more dataset to update the study and enhance the accuracy. Although the voting strategy of 5 models achieved 95% in the test data, the models misdiagnosed 2 images of HCM data as normal (Fig 8). To avoid those errors, the diagnosis should not only rely on a single examination.

A good CNN model for diagnosing HCM must distinguish HCM from normal heart distinctively. To visualize how the model deals with test data, we plotted the two-dimensional graph of how the model recognizes test data. As the model is composed of high dimensions that cannot be plotted, the t-SNE function in the scikit learn library was implemented [24]. The accuracy of Mobilenet was only 76% because the model did not separate HCM from normal data. Further research is required to improve the separation of the HCM data from normal.

Peeking is one of the most challenging problems encountered by all DL scientists. Peeking is associated with overfitting in which the repeated modification of model weights based on test data causes grave outcomes in new data. Many computer scientists tried several attempts to overcome the peeking phenomenon [25, 26]. In this study, we tried to prevent the peeking phenomenon by 1) dividing the training dataset, validation dataset, and training dataset, 2) using the dropout function on each deep layer, and 3) evaluating new data that had never been used in training, validating and testing process. Despite all the precautions, we experienced the peeking phenomenon in every model. To improve the accuracy and reduce misdiagnosis of a single model, we tried the voting strategy as advised by the reviewer. As a result, we improved the accuracy by 95% in all test data combined.

Conclusion

Our findings demonstrated that the five DL architectures provided 95% accuracy in all test data. Due to the peeking phenomenon, all models showed lower accuracy in the new test data and Mobilenet even showed 55% of accuracy in the new data. However, by implementing a softmax voting strategy using all 5 models, 95% accuracy in combined test data has been achieved. In conclusion, The automated DL system achieved high performance and could help local veterinarians screen HCM radiographically.

Supporting information

S1 Table. The category of the dataset used in the experiment.

(DOCX)

S2 Table. The accuracy, of 5 models on additional test data and combined results.

(DOCX)

Acknowledgments

This study is supported by the Chungnam National University and the Korea Institute of Toxicology (KIT).

Data Availability

All relevant data are available on Figshare: 10.6084/m9.figshare.21128266.

Funding Statement

This study was supported by the Chungnam National University (2020-1438-01). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Payne JR, Brodbelt DC, Luis Fuentes V. Cardiomyopathy prevalence in 780 apparently healthy cats in rehoming centres (the CatScan study). J Vet Cardiol. 2015;17 Suppl 1:S244–57. doi: 10.1016/j.jvc.2015.03.008 [DOI] [PubMed] [Google Scholar]
  • 2.Riesen SC, Kovacevic A, Lombard CW, Amberger C. Prevalence of heart disease in symptomatic cats: an overview from 1998 to 2005. Schweiz Arch Tierheilkd. 2007;149(2):65–71. doi: 10.1024/0036-7281.149.2.65 [DOI] [PubMed] [Google Scholar]
  • 3.Nelson RW, Couto CG. Small animal internal medicine. Fifth edition. ed. St. Louis, MO: Elsevier/Mosby; 2014. xxix, 1473 pages. [Google Scholar]
  • 4.Egenvall A, Nodtvedt A, Haggstrom J, Strom Holst B, Moller L, Bonnett BN. Mortality of life-insured Swedish cats during 1999–2006: age, breed, sex, and diagnosis. J Vet Intern Med. 2009;23(6):1175–83. doi: 10.1111/j.1939-1676.2009.0396.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Zachary JF, McGavin MD. Pathologic basis of veterinary disease. 5th ed. St.Louis, Mo.: Elsevier; 2012. xvi, 1322 p. p. [Google Scholar]
  • 6.Fox PR, Schober KA. Management of asymptomatic (occult) feline cardiomyopathy: Challenges and realities. J Vet Cardiol. 2015;17 Suppl 1:S150–8. doi: 10.1016/j.jvc.2015.03.004 [DOI] [PubMed] [Google Scholar]
  • 7.Luis Fuentes V, Abbott J, Chetboul V, Cote E, Fox PR, Haggstrom J, et al. ACVIM consensus statement guidelines for the classification, diagnosis, and management of cardiomyopathies in cats. J Vet Intern Med. 2020;34(3):1062–77. doi: 10.1111/jvim.15745 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Laudhittirut T, Rujivipat N, Saringkarisate K, Soponpattana P, Tunwichai T, Surachetpong SD. Accuracy of methods for diagnosing heart diseases in cats. Vet World. 2020;13(5):872–8. doi: 10.14202/vetworld.2020.872-878 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Turner OC, Aeffner F, Bangari DS, High W, Knight B, Forest T, et al. Society of Toxicologic Pathology Digital Pathology and Image Analysis Special Interest Group Article*: Opinion on the Application of Artificial Intelligence and Machine Learning to Digital Toxicologic Pathology. Toxicol Pathol. 2020;48(2):277–94. doi: 10.1177/0192623319881401 [DOI] [PubMed] [Google Scholar]
  • 10.Marzahl C, Aubreville M, Bertram CA, Stayt J, Jasensky AK, Bartenschlager F, et al. Deep Learning-Based Quantification of Pulmonary Hemosiderophages in Cytology Slides. Sci Rep. 2020;10(1):9795. doi: 10.1038/s41598-020-65958-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Nagamori Y, Hall Sedlak R, DeRosa A, Pullins A, Cree T, Loenser M, et al. Evaluation of the VETSCAN IMAGYST: an in-clinic canine and feline fecal parasite detection system integrated with a deep learning algorithm. Parasit Vectors. 2020;13(1):346. doi: 10.1186/s13071-020-04215-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kozlowski C, Brumm J, Cain G. An Automated Image Analysis Method to Quantify Veterinary Bone Marrow Cellularity on H&E Sections. Toxicol Pathol. 2018;46(3):324–35. [DOI] [PubMed] [Google Scholar]
  • 13.Banzato T, Wodzinski M, Burti S, Osti VL, Rossoni V, Atzori M, et al. Automatic classification of canine thoracic radiographs using deep learning. Sci Rep. 2021;11(1):3964. doi: 10.1038/s41598-021-83515-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Jhang K., Voting and Ensemble Schemes Based on CNN Models for Photo-Based Gender Prediction. JIPS. 2020;16(4):809–819. [Google Scholar]
  • 15.Fukushima K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36(4):193–202. doi: 10.1007/BF00344251 [DOI] [PubMed] [Google Scholar]
  • 16.He K, Zhang X, Ren S, Sun J, editors. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016.
  • 17.Szegedy C, Ioffe S, Vanhoucke V, Alemi AA, editors. Inception-v4, inception-resnet and the impact of residual connections on learning. Thirty-first AAAI conference on artificial intelligence; 2017.
  • 18.Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861. 2017. [Google Scholar]
  • 19.Chollet F, editor Xception: Deep learning with depthwise separable convolutions. Proceedings of the IEEE conference on computer vision and pattern recognition; 2017.
  • 20.Hattel A, Monga V, Srinivas U, Gillespie J, Brooks J, Fisher J, et al. Development and evaluation of an automated histology classification system for veterinary pathology. J Vet Diagn Invest. 2013;25(6):765–9. doi: 10.1177/1040638713506901 [DOI] [PubMed] [Google Scholar]
  • 21.Kartasalo K, Latonen L, Vihinen J, Visakorpi T, Nykter M, Ruusuvuori P. Comparative analysis of tissue reconstruction algorithms for 3D histology. Bioinformatics. 2018;34(17):3013–21. doi: 10.1093/bioinformatics/bty210 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Liu M, Eckersall PD, Mrljak V, Horvatić A, Guillemin N, Galan A, et al. Novel biomarkers in cats with congestive heart failure due to primary cardiomyopathy. Journal of Proteomics. 2020;226:103896. doi: 10.1016/j.jprot.2020.103896 [DOI] [PubMed] [Google Scholar]
  • 23.Guglielmini C, Diana A. Thoracic radiography in the cat: Identification of cardiomegaly and congestive heart failure. J Vet Cardiol. 2015;17 Suppl 1:S87–101. doi: 10.1016/j.jvc.2015.03.005 [DOI] [PubMed] [Google Scholar]
  • 24.Laurens M, Geoffrey H. Visualizing data using t-SNE. JMLR. 2018;9(86):2579–2605. [Google Scholar]
  • 25.Kavur AE, Gezer NS, Baris M, Aslan S, Conze PH, Groza V, et al. CHAOS Challenge- combined (CT-MR) healthy abdominal organ segmentation. Med Image Anal. 2021;69:101950. doi: 10.1016/j.media.2020.101950 [DOI] [PubMed] [Google Scholar]
  • 26.Sima C, Dougherty E. The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters. 2008;29:1667–74. [Google Scholar]

Decision Letter 0

Jude Hemanth

4 Jul 2022

PONE-D-22-14859Deep learning-based diagnosis of feline hypertrophic cardiomyopathy: Comparison of five neural enginesPLOS ONE

Dear Dr. Young Son,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

ACADEMIC EDITOR: 

  • Concentrate on the dataset, it’s size and the ground truth more to validate your results

Please submit your revised manuscript by Aug 18 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jude Hemanth

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in your Competing Interests section:  

"NO authors have competing interests"

Please complete your Competing Interests on the online submission form to state any Competing Interests. If you have no competing interests, please state ""The authors have declared that no competing interests exist."", as detailed online in our guide for authors at http://journals.plos.org/plosone/s/submit-now 

 This information should be included in your cover letter; we will change the online submission form on your behalf.

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

4. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Summary: Rho et al attempt to use deep learning networks to diagnose hypertrophic cardiomyopathy from radiographs of cats. Networks were trained on 231 radiographs (143 HCM vs 88 normal) and tested on a hold-out dataset of 22 radiographs (11 HCM vs 11 normals). Five different off-the-shelf network types were compared and all five gave identical results: 95% diagnostic accuracy (compared to 75% without AI).

Review:

I disclose that I have no expertise in vetenary/feline medicine but there does seem to be significant utility in such an algorithm. The manuscript is easy to follow and understand.

There are however some limitations:

1. the numbers of both training and test data (only 11 diseased subjects) is small. Probably too small to judge the algorithm's effectiveness on.

2. the validation set seems to be from the same institution and not independent, which limits the generalisability of the conclusions.

3. the paper only compares healthy cats to HCM. In humans, this is an easy problem -- the difficulty comes when you also consider other pathologies/phenotypes such as DCM, valve disease etc which probably give a similar cardiac silhouette. I would be interested to know how the algorithm performs on other pathologies.

4. they use the opinion of an experienced reader as ground truth -- this is not very strong. Wouldn't echo/ultrasound give you a much stronger gold standard?

5. the comparison of 5 network architecture does not add much -- particularly since results are identical.

Minor comments

1. while the paper is easy to follow there are several grammatical errors which could do with correction

2. were images augmented for training? Presumably so given the small size of the training data but the parameters need to be stated.

3. Several performance measures are reported (e.g. in table 1), but sensitivity and specificity would be most useful

4. please include more details on ethics.

Reviewer #2: As its name indicates, this paper deals with Deep learning-based diagnosis of feline hypertrophic cardiomyopathy through Comparison of five neural engines

1. I do not have any comments about the deep models, but the outcomes of the simulations are given only quantitatively. It would be nice to have some expert feedbacks for evaluating the results.

2. The authors should also show that the results explained in this article have fair developing stages without making “peeking”. Briefly, peeking is using testing datasets for validation purposes (such as parameter tuning) by making too many iterative submissions (Kuncheva (2014), page 17). In other words, testing sets, which should only be in the previously unseen data, now do not serve testing purposes anymore. Even though it is an indirect usage, peeking is surprisingly an underestimated problem in academic studies which causes overfitting of deep models on a target data. Therefore, theoretically very successful models for specific data may not be useful for real-world problems.

Kavur, A. Emre, et al. "CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation." Medical Image Analysis 69 (2021): 101950.

The authors should at least discuss the potential effects of peeking on their results.

They should also consider, (maybe apply if possible) and discuss strategies that are being used to avoid peeking such as:

Selver. et al "Basic Ensembles of Vanilla-Style Deep Learning Models Improve Liver Segmentation From CT Images." arXiv preprint arXiv:2001.09647 (2020).

Conze, Pierre-Henri, et al. "Abdominal multi-organ segmentation with cascaded convolutional and adversarial deep networks." Artificial Intelligence in Medicine 117 (2021): 102109.

3. Since the authors used many models to compare the results, they can also test the ensemble strategies, which are shown to outperform single model results at medical image analysis applications such as :

Menze, Bjoern H., et al. "The multimodal brain tumor image segmentation benchmark (BRATS)." IEEE transactions on medical imaging 34.10 (2014): 1993-2024.

Kavur, A. Emre, et al. "Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors." Diagnostic and Interventional Radiology 26.1 (2020): 11.

4. The diversity and omplementarity of the utilized models shoudl also be analyzed through statistis and Kappa. An eemplary analysis an be found at

Toprak, Tugce, et al. "Conditional weighted ensemble of transferred models for camera based onboard pedestrian detection in railway driver support systems." IEEE Transactions on Vehicular Technology 69.5 (2020): 5041-5054.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Rhodri H Davies

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Feb 2;18(2):e0280438. doi: 10.1371/journal.pone.0280438.r002

Author response to Decision Letter 0


7 Oct 2022

[2022.09.16.]

Dear Reviewers:

We thank you and the reviewers for your thoughtful suggestions and insights. The manuscript has benefited from these insightful suggestions. I look forward to working with you and the reviewers to move this manuscript closer to publication in PLOS ONE.

The manuscript has been rechecked and the necessary changes have been made in accordance with the reviewers’ suggestions. The responses to all comments have been prepared and attached herewith/given below.

Thank you for your consideration. I look forward to hearing from you.

Sincerely,

Hwa-Young Son

College of Veterinary Medicine,

Chungnam National University, Daejeon,

Republic of Korea

E-mail: hyson@cnu.ac.kr

Reviewer #1: Summary: Rho et al attempt to use deep learning networks to diagnose hypertrophic cardiomyopathy from radiographs of cats. Networks were trained on 231 radiographs (143 HCM vs 88 normal) and tested on a hold-out dataset of 22 radiographs (11 HCM vs 11 normals). Five different off-the-shelf network types were compared and all five gave identical results: 95% diagnostic accuracy (compared to 75% without AI).

Review:

I disclose that I have no expertise in vetenary/feline medicine but there does seem to be significant utility in such an algorithm. The manuscript is easy to follow and understand.

There are however some limitations:

1. the numbers of both training and test data (only 11 diseased subjects) is small. Probably too small to judge the algorithm's effectiveness on.

- Thank you for your advice. We understand your concerns regarding the small test datasets. Because of the scarcity of legally available radiographic images of HCM-affected cats, we had no choice but to focus on training the model. To enhance the credibility of our model, additional 10 radiographic images of normal and HCM-affected cats from another animal hospital are evaluated.

2. the validation set seems to be from the same institution and not independent, which limits the generalisability of the conclusions.

- Thank you for the critical advice on the generalizability of the image. The image is obtained not only from the teaching hospital but also from three other local hospitals. We did not comment on the data's origin for the ethical and private concerns. Here, we included the tables describing the origin of the images (although the specific name of the institution is hidden) in supplementary table 1.

3. the paper only compares healthy cats to HCM. In humans, this is an easy problem -- the difficulty comes when you also consider other pathologies/phenotypes such as DCM, valve disease etc which probably give a similar cardiac silhouette. I would be interested to know how the algorithm performs on other pathologies.

- We appreciate your advice. The prevalence of cardiac disease in cat is quite different from any other animals or human. The most frequent, predominant cardiac disease is HCM and other cardiac disease is very rare in cat. In 306 primary cardiac disorders from 1998 to 2005, 252 cases were cardiomyopathy (82%) and 48 cases (16%) were congenital heart disease. We unfortunately failed to obtain radiographic images of other cardiac diseases in cats. Therefore, diagnosing whether it is DCM from radiographic image may be most important for cats.

S.C. Riesen, A. Kovacevic, et al. Prevalence of heart disease in symptomatic cats: an overview from 1998 to 2005

Richard W. Nelson, C. Guillermo Couto, Small animal internal medicine 5th edition.

4. they use the opinion of an experienced reader as ground truth -- this is not very strong. Wouldn't echo/ultrasound give you a much stronger gold standard?

- We thank you for the interesting question. As mentioned, the gold standard for diagnosing heart disease is an ultrasound.

Because only specialized veterinarians can perform echocardiography and a US machine is required, most veterinarians rely on X-ray and auscultation. The animal is referred to a specialized animal hospital if cardiac disease is suspected in auscultation or radiography. At this point, the deep learning algorithm we developed can suggest and screen whether the animal may be affected by HCM at the x-ray level.

5. the comparison of 5 network architecture does not add much -- particularly since results are identical.

- Thank you for the advice. As the accuracy of the architecture is high predominantly, comparing the five models seems futile. However, in the detailed observation, you will see that misdiagnosed images are different except Resnets. The data shows that each model has different flaws that drive the misdiagnosis.

Based on your opinion that the results are identical, we ensembled the trained architecture. The ensemble model works when the prediction made by each model considers five prediction values and makes the final decision. As a result, the ensemble model achieved 100% accuracy from the old test sample.

Minor comments

1. while the paper is easy to follow there are several grammatical errors which could do with correction

- Thank you for the detailed advice. We requested a re-assessment of the English and modified it accordingly.

2. were images augmented for training? Presumably so given the small size of the training data but the parameters need to be stated.

- Thank you for the specific advice. We strengthened the detailed description of data augmentation.

3. Several performance measures are reported (e.g. in table 1), but sensitivity and specificity would be most useful

- Thank you for the thorough suggestion. We added sensitivity and specificity to the table1.

4. please include more details on ethics.

- We appreciate your advice. We elaborated on the details of the ethics.

Reviewer #2: As its name indicates, this paper deals with Deep learning-based diagnosis of feline hypertrophic cardiomyopathy through Comparison of five neural engines

1. I do not have any comments about the deep models, but the outcomes of the simulations are given only quantitatively. It would be nice to have some expert feedbacks for evaluating the results.

- Thank you for the advice. We invited an expert as an author as participating the revision in the computer vision field.

2. The authors should also show that the results explained in this article have fair developing stages without making “peeking”. Briefly, peeking is using testing datasets for validation purposes (such as parameter tuning) by making too many iterative submissions (Kuncheva (2014), page 17). In other words, testing sets, which should only be in the previously unseen data, now do not serve testing purposes anymore. Even though it is an indirect usage, peeking is surprisingly an underestimated problem in academic studies which causes overfitting of deep models on a target data. Therefore, theoretically very successful models for specific data may not be useful for real-world problems.

Kavur, A. Emre, et al. "CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation." Medical Image Analysis 69 (2021): 101950.

The authors should at least discuss the potential effects of peeking on their results.

They should also consider, (maybe apply if possible) and discuss strategies that are being used to avoid peeking such as:

Selver. et al. "Basic Ensembles of Vanilla-Style Deep Learning Models Improve Liver Segmentation From CT Images." arXiv preprint arXiv:2001.09647 (2020).

Conze, Pierre-Henri, et al. "Abdominal multi-organ segmentation with cascaded convolutional and adversarial deep networks." Artificial Intelligence in Medicine 117 (2021): 102109.

- Thank you for the crucial advice. As mentioned, peeking is a crucial obstacle when we develop the model. To avoid peeking, we divided the validation sample from the training sample during the training session and tested it on untrained samples. In addition, we obtained additional radiographic samples from another vet hospital and tested them. Although the sample accuracy obtained was not very precise, we achieved over 78% accuracy from additional samples.

We added these findings to the results section.

3. Since the authors used many models to compare the results, they can also test the ensemble strategies, which are shown to outperform single model results at medical image analysis applications such as :

- We appreciate your important advice. As you mentioned, the ensemble strategy can outperform a single model when every single neural model shows high variance in its prediction. We applied a voting ensemble to five models and achieved 100% accuracy from the test sample. The additional sample showed only 80% accuracy, although the single model (Xception) achieved 97% accuracy.

This finding has been added to the results section.

Menze, Bjoern H., et al. "The multimodal brain tumor image segmentation benchmark (BRATS)." IEEE transactions on medical imaging 34.10 (2014): 1993-2024.

Kavur, A. Emre, et al. "Comparison of semi-automatic and deep learning-based automatic methods for liver segmentation in living liver transplant donors." Diagnostic and Interventional Radiology 26.1 (2020): 11.

4. The diversity and omplementarity of the utilized models shoudl also be analyzed through statistis and Kappa. An eemplary analysis an be found at

Toprak, Tugce, et al. "Conditional weighted ensemble of transferred models for camera based onboard pedestrian detection in railway driver support systems." IEEE Transactions on Vehicular Technology 69.5 (2020): 5041-5054.

- Thank you for the critical advice for our research. In our opinion, the kappa analysis may be helpful in cases wherein there is no exact answer to the problem (ex, unsupervised training). However, the analysis is not applicable in this study because vet specialists confirm all the data. We already compared the results of each model and provided the accuracy of each model.

Attachment

Submitted filename: Response to Reviers.docx

Decision Letter 1

Jude Hemanth

8 Nov 2022

PONE-D-22-14859R1Deep learning-based diagnosis of feline hypertrophic cardiomyopathy: Comparison of five deep neural network modelsPLOS ONE

Dear Dr. Son

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR:

  • Revise

==============================

Please submit your revised manuscript by Dec 23 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jude Hemanth

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you to the authors for taking our comments on board and adapting the manuscript -- a lot of effort has gone into this the manuscript is improved as a result.

In summary, the test data has expanded, but still on the small side (though I appreciate the scarcity of data) and the ground truth is slightly suboptimal (but consistent with what is used in clinical practice). The authors have answered all my comments and I have responded to these, with my responses denoted by '>>'.

1.. the numbers of both training and test data (only 11 diseased subjects) is small.

Probably too small to judge the algorithm's effectiveness on.

- Thank you for your advice. We understand your concerns regarding the small test

datasets. Because of the scarcity of legally available radiographic images of HCMaffected cats, we had no choice but to focus on training the model. To enhance the

credibility of our model, additional 10 radiographic images of normal and HCM-affected

cats from another animal hospital are evaluated.

>> thank you. It would be worth emphasising in the main text that you now have more independent test data (the text still reads 11 HCM and 11 normals).

2. the validation set seems to be from the same institution and not independent, which

limits the generalisability of the conclusions.

- Thank you for the critical advice on the generalizability of the image. The image is

obtained not only from the teaching hospital but also from three other local hospitals.

We did not comment on the data's origin for the ethical and private concerns. Here, we

included the tables describing the origin of the images (although the specific name of

the institution is hidden) in supplementary table 1.

>>thanks. it would be well worth emphasising this in the main text.

3. the paper only compares healthy cats to HCM. In humans, this is an easy problem --

the difficulty comes when you also consider other pathologies/phenotypes such as

DCM, valve disease etc which probably give a similar cardiac silhouette. I would be

interested to know how the algorithm performs on other pathologies.

- We appreciate your advice. The prevalence of cardiac disease in cat is quite different

from any other animals or human. The most frequent, predominant cardiac disease is

HCM and other cardiac disease is very rare in cat. In 306 primary cardiac disorders

from 1998 to 2005, 252 cases were cardiomyopathy (82%) and 48 cases (16%) were

congenital heart disease. We unfortunately failed to obtain radiographic images of

other cardiac diseases in cats. Therefore, diagnosing whether it is DCM from

radiographic image may be most important for cats.

S.C. Riesen, A. Kovacevic, et al. Prevalence of heart disease in symptomatic cats: an

overview from 1998 to 2005

Richard W. Nelson, C. Guillermo Couto, Small animal internal medicine 5th edition.

>>thanks for clarifying -- this makes sense. Could this information (and the references) be added to the main body of the paper?

4. they use the opinion of an experienced reader as ground truth -- this is not very

strong. Wouldn't echo/ultrasound give you a much stronger gold standard?

- We thank you for the interesting question. As mentioned, the gold standard for

diagnosing heart disease is an ultrasound.

Because only specialized veterinarians can perform echocardiography and a US

machine is required, most veterinarians rely on X-ray and auscultation. The animal is

referred to a specialized animal hospital if cardiac disease is suspected in auscultation

or radiography. At this point, the deep learning algorithm we developed can suggest

and screen whether the animal may be affected by HCM at the x-ray level.

>>OK, thanks. Like me, many PLOS readers will also have little expertise in feline medicine and the availability of data so it would be worth emphasising this in the text. It would also be worth adding a discussion about the limitations (subjectivity, error and variability of experts etc) in the discussion.

5. the comparison of 5 network architecture does not add much -- particularly since

results are identical.

- Thank you for the advice. As the accuracy of the architecture is high predominantly,

comparing the five models seems futile. However, in the detailed observation, you will

see that misdiagnosed images are different except Resnets. The data shows that each

model has different flaws that drive the misdiagnosis.

Based on your opinion that the results are identical, we ensembled the trained

architecture. The ensemble model works when the prediction made by each model

considers five prediction values and makes the final decision. As a result, the

ensemble model achieved 100% accuracy from the old test sample.

>>Thanks. You might want to reconsider the title etc. The main contribution to the paper is a novel algorithm for diagnosing feline HCM from X-rays. Adding the subtitle about 5 neural nets downplays your contribution a little...

Thank you for addressing the other comments -- they have been done satisfactorily.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Feb 2;18(2):e0280438. doi: 10.1371/journal.pone.0280438.r004

Author response to Decision Letter 1


8 Dec 2022

[2022.12.09.]

Dear Editor

PLOS ONE

Dear Editor:

We wish to re-submit the manuscript titled ““Deep learning-based diagnosis of feline hypertrophic cardiomyopathy: Comparison of five neural engines.” The manuscript ID is D-22-14859.

We thank you and the reviewer for your considerate suggestions and advice. The manuscript has benefited from these insightful suggestions. I look forward to working with you and the reviewers to move this manuscript closer to publication in PLOS ONE.

The manuscript has been rechecked and the necessary changes have been made in accordance with the reviewers’ suggestions. The responses to all comments have been prepared and attached herewith/given below.

Thank you for your consideration. I look forward to hearing from you.

Sincerely,

Hwa-Young Son

College of Veterinary Medicine,

Chungnam National University, Daejeon,

Republic of Korea

E-mail: hyson@cnu.ac.kr

Reviewer #1: Thank you to the authors for taking our comments on board and adapting the manuscript -- a lot of effort has gone into this the manuscript is improved as a result.

In summary, the test data has expanded, but still on the small side (though I appreciate the scarcity of data) and the ground truth is slightly suboptimal (but consistent with what is used in clinical practice). The authors have answered all my comments and I have responded to these, with my responses denoted by '>>'.

1.. the numbers of both training and test data (only 11 diseased subjects) is small.

Probably too small to judge the algorithm's effectiveness on.

- Thank you for your advice. We understand your concerns regarding the small test

datasets. Because of the scarcity of legally available radiographic images of HCMaffected cats, we had no choice but to focus on training the model. To enhance the

credibility of our model, additional 10 radiographic images of normal and HCM-affected

cats from another animal hospital are evaluated.

>> thank you. It would be worth emphasising in the main text that you now have more independent test data (the text still reads 11 HCM and 11 normals).

- Thank you for the advice. We changed the main text to 42 images used in the test dataset.

2. the validation set seems to be from the same institution and not independent, which

limits the generalisability of the conclusions.

- Thank you for the critical advice on the generalizability of the image. The image is

obtained not only from the teaching hospital but also from three other local hospitals.

We did not comment on the data's origin for the ethical and private concerns. Here, we

included the tables describing the origin of the images (although the specific name of

the institution is hidden) in supplementary table 1.

>>thanks. it would be well worth emphasising this in the main text.

- thank you. The generalizability of the data is added to the abstract and the materials and methods. To emphasize the five institutions, we moved supplementary table 1 into table 1.

3. the paper only compares healthy cats to HCM. In humans, this is an easy problem --

the difficulty comes when you also consider other pathologies/phenotypes such as

DCM, valve disease etc which probably give a similar cardiac silhouette. I would be

interested to know how the algorithm performs on other pathologies.

- We appreciate your advice. The prevalence of cardiac disease in cat is quite different

from any other animals or human. The most frequent, predominant cardiac disease is

HCM and other cardiac disease is very rare in cat. In 306 primary cardiac disorders

from 1998 to 2005, 252 cases were cardiomyopathy (82%) and 48 cases (16%) were

congenital heart disease. We unfortunately failed to obtain radiographic images of

other cardiac diseases in cats. Therefore, diagnosing whether it is HCM from

radiographic image may be most important for cats.

S.C. Riesen, A. Kovacevic, et al. Prevalence of heart disease in symptomatic cats: an

overview from 1998 to 2005

Richard W. Nelson, C. Guillermo Couto, Small animal internal medicine 5th edition.

>>thanks for clarifying -- this makes sense. Could this information (and the references) be added to the main body of the paper?

- thank you for the advice. this information is added to the introduction.

4. they use the opinion of an experienced reader as ground truth -- this is not very

strong. Wouldn't echo/ultrasound give you a much stronger gold standard?

- We thank you for the interesting question. As mentioned, the gold standard for

diagnosing heart disease is an ultrasound.

Because only specialized veterinarians can perform echocardiography and a US

machine is required, most veterinarians rely on X-ray and auscultation. The animal is

referred to a specialized animal hospital if cardiac disease is suspected in auscultation

or radiography. At this point, the deep learning algorithm we developed can suggest

and screen whether the animal may be affected by HCM at the x-ray level.

>>OK, thanks. Like me, many PLOS readers will also have little expertise in feline medicine and the availability of data so it would be worth emphasising this in the text. It would also be worth adding a discussion about the limitations (subjectivity, error and variability of experts etc) in the discussion.

- thank you for the advice. The feline medicine and the availability of the data is added into the discussion.

5. the comparison of 5 network architecture does not add much -- particularly since

results are identical.

- Thank you for the advice. As the accuracy of the architecture is high predominantly,

comparing the five models seems futile. However, in the detailed observation, you will

see that misdiagnosed images are different except Resnets. The data shows that each

the model has different flaws that drive the misdiagnosis.

Based on your opinion that the results are identical, we ensembled the trained

architecture. The ensemble model works when the prediction made by each model

considers five prediction values and makes the final decision. As a result, the

ensemble model achieved 100% accuracy from the old test sample.

>>Thanks. You might want to reconsider the title etc. The main contribution to the paper is a novel algorithm for diagnosing feline HCM from X-rays. Adding the subtitle about 5 neural nets downplays your contribution a little...

- Thank you for the advice. We changed the title so that the novel algorithm and the ensemble method are focused.

Attachment

Submitted filename: Response to Reviers.docx

Decision Letter 2

Jude Hemanth

2 Jan 2023

Deep learning-based diagnosis of feline hypertrophic cardiomyopathy

PONE-D-22-14859R2

Dear Dr. Son

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jude Hemanth

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you answering my queries. All my comments have been addressed.

Congratulations on a nice piece of work

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

Acceptance letter

Jude Hemanth

24 Jan 2023

PONE-D-22-14859R2

Deep learning-based diagnosis of feline hypertrophic cardiomyopathy

Dear Dr. Son:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jude Hemanth

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. The category of the dataset used in the experiment.

    (DOCX)

    S2 Table. The accuracy, of 5 models on additional test data and combined results.

    (DOCX)

    Attachment

    Submitted filename: Response to Reviers.docx

    Attachment

    Submitted filename: Response to Reviers.docx

    Data Availability Statement

    All relevant data are available on Figshare: 10.6084/m9.figshare.21128266.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES