Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2022 May 14.
Published in final edited form as: Wirel Commun Mob Comput. 2021 Jul 1;2021:1–17. doi: 10.1155/2021/5792975

SOSPCNN: Structurally Optimized Stochastic Pooling Convolutional Neural Network for Tetralogy of Fallot recognition

Shui-Hua Wang 1,#, Kaihong Wu 2,#, Tianshu Chu 3, Steven L Fernandes 4, Qinghua Zhou 1, Yu-Dong Zhang 1,3,*, Jian Sun 2,*
PMCID: PMC7612722  EMSID: EMS144800  PMID: 35573891

Abstract

Aim

This study proposes a new artificial intelligence model based on cardiovascular computed tomography for more efficient and precise recognition of Tetralogy of Fallot (TOF).

Methods

Our model is a structurally optimized stochastic pooling convolutional neural network (SOSPCNN), which combines stochastic pooling, structural optimization, and convolutional neural network. In addition, multiple-way data augmentation is used to overcome overfitting. Grad-CAM is employed to provide explainability to the proposed SOSPCNN model. Meanwhile, both desktop and web apps are developed based on this SOSPCNN model.

Results

The results on ten runs of 10-fold cross-validation show that our SOSPCNN model yields a sensitivity of 92.25±2.19, a specificity of 92.75±2.49, a precision of 92.79±2.29, an accuracy of 92.50±1.18, an F1 score of 92.48±1.17, an MCC of 85.06±2.38, an FMI of 92.50±1.17, and an AUC of 0.9587.

Conclusion

The SOSPCNN method performed better than three state-of-the-art TOF recognition approaches.

Keywords: Tetralogy of Fallot, computed tomography, artificial intelligence, machine learning, deep learning, stochastic pooling, structural optimization, convolutional neural network, multiple-way data augmentation, Grad-CAM, cross-validation, deep neural network

1. Introduction

Tetralogy of Fallot (TOF) is a congenital defect that influences normal blood flow through the heart [1]. It is made up of 4 defects of the heart and its blood vessels [2]: (a) ventricular septal defect; (b) overriding aorta; (c) right ventricular outflow tract stenosis; and (d) right ventricular hypertrophy. Defects of TOF can cause oxygen in the blood that flows to the rest of the body to be reduced. Infants with TOF have a bluish-looking skin color [3] since their blood does not carry enough oxygen.

Traditional diagnosis of TOF is after a baby is born, often after the infant had an episode of cyanosis during crying or feeding. The most common test is an echocardiogram [4], an ultrasound of the heart that can show problems with the heart structure and how well the heart is working with this defect. Recently, Computed tomography (CT) has shown its success in the differential diagnosis of TOF [5], since it can provide detailed images of many types of cardiovascular issue; besides, computed tomography (CT) can be performed even if the subject has an implanted medical device, unlike magnetic resonance imaging (MRI) [6].

Manual diagnosis on CT is lab-intensive, onerous, and needs expert skills. Besides, the manual results vary due to intra-expert and inter-expert factors. Shan, F. et al. (2021) [7] mention that “fully manual delineation that often takes hours”, and the modern automatic diagnosis models based on artificial intelligence (AI) can only take seconds to minutes to get decisions, which now becomes a hot research field.

For example, Ye, D. H. et al. (2011) [8] present a morphological classification (MC) method. The authors extract morphological features by registering cardiac MRI scans to a template. Later, deep learning (DL) rises as a new type of artificial intelligence (AI) technique and has shown its powerfulness in many academic and industrial fields. Within the field of DL, convolutional neural network (CNN) is one standard DL algorithm that is particularly suitable for handling images. Giannakidis, A. et al. (2016) [9] presented a multi-scale three-dimensional CNN (3DCNN) for segmentation of the right ventricle. Tandon, A. et al. (2021) [10] present a ventricular contouring CNN (VCCNN) algorithm.

The difference between this study to previous studies is that we simplify the problem to a binary-coded classification problem [11]; that is, given an input cardiovascular CT image, the AI model should have the ability to give a binary output, i.e., predict whether the subject is TOF or healthy. This simplification makes the AI model focus on the prediction task itself and does not need to generate human-understandable outputs (such as segmentation, contouring, etc.) in the light of the expectation to make our AI model more accurate. Furthermore, we propose a new stochastic pooling CNN (SCCNN) that uses a new pooling technique—stochastic pooling to improve the prediction performance. In all, our contributes are four-fold:

  • (a)

    Stochastic pooling is employed to replace traditional max-pooling.

  • (b)

    Structural optimization is carried out to fix the optimal structure

  • (c)

    Multiple-way DA is introduced to increase the diversity of training images

  • (d)

    Experiments by ten runs of 10-fold cross-validation show that our method is better than three state-of-the-art approaches.

The rest of this paper is structured as follows: Section 2 describes the dataset. Section 3 contains the rationale of methodology, including the preprocessing, stochastic pooling, structural optimization, multiple-way data augmentation, the implementation, Grad-CAM, and evaluation measures. Section 4 presents the experimental results and discussions. Section 5 concludes this paper.

2. Dataset

This study is retrospective research, of which ethical approval is exempted. The imaging protocol is described below: Philips Brilliance 256 row spiral CT machine, KV: 80, MAS: 138, Layer Thickness 0.8 mm, Lung Window (W: 1600 HU, L: -600 HU), Mediastinal Window (W: 750 HU, L: 90 HU), thin layer reconstruction according to the lesion display, layer thickness, and layer distance are both 0.8mm mediastinal window images. Place the patient in a supine position, let the patient breathe deeply after holding in, and conventionally scan from the apex of the lung to the costal diaphragmatic angle. The resolutions of all images are 512 by 512 pixels. Data is available upon reasonable requests to corresponding authors.

We selected ten children with Tetralogy of Fallot who were admitted to Nanjing Children’s Hospital from March 2017 to March 2020. We then used a systematic random sampling method to select ten normal children from healthy medical examiners within the same period of time. The Tetralogy of Fallot (TOF) observation group included three males and seven females, aged 4-22 months, with an average age of (8.90±5.47) months. Normal children in the control group included six males and four females, aged from 3 months to 24 months, with an average age of 10.4±8.14 months. Inclusion criteria for children with confirmed Tetralogy of Fallot are as follows:

  • (1).

    CT suggests Tetralogy of Fallot.

  • (2).

    Surgery confirmed that the anatomical deformity of the heart is Tetralogy of Fallot.

3. Methodology

3.1. Preprocessing

Table 1 lists the abbreviation list for the ease of reading. A five-step preprocessing was carried out on all the images to select the important slices, save storage, enhance contrast, remove unnecessary image regions, and reduce the image resolution.

Table 1. Abbreviation List.

Abbreviation Meaning
AP average pooling
AUC area under the curve
BN normalization
CNN convolutional neural network
CT computed tomography
DA data augmentation
DPD discrete probability distribution
FCL fully connected layer
FMI Fowlkes-Mallows index
Grad-CAM gradient-weighted class activation mapping
GUI graphical user interface
HS histogram stretching
L2P l2-norm pooling
MCC Matthews correlation coefficient
MP max-pooling
MRI magnetic resonance imaging
MSD mean and standard deviation
PM probability map
ReLU rectified linear unit
RLV random location vector
ROC receiver operating characteristic
SC strided convolution
SP stochastic pooling
TOF Tetralogy of Fallot

First, four slices were chosen by radiologists using a slice-level selection method. For TOF patients, the slices showing the largest size and number of lesions were selected. For healthy control subjects, any level of the image can be selected. Now we have in total 40 TOF images and 40 HC images.

Second, all the images are converted to grayscale images and stored in tiff format [12] using the compression lossless method. Third, histogram stretching (HS) was employed to enhance image contrast. Suppose the k-th input and output of HS is x(k) and y(k). HS can be formulated as

y(k)=x(k)xmin(k)xmax(k)xmin(k) (1)

where xmin(k) and xmax(k) stand for the minimum and maximum grayscale values in the input x(k).

Fourth, cropping was done in order to eliminate the check-up bed at the bottom, the subject’s two arms at bilateral sides, the rulers at the bottom and right side, and information (hospital, scanning protocol, subject’s information, image head information, and labeling) at four corners.

Lastly, downscaling was performed to reduce each image to the size of [256 × 256]. Figure 1 displays the deigram of our preprocessing procedure. Figure 2(a & b) shows two preprocessed examples of TOF and HC, respectively.

Figure 1. Diagram of preprocessing.

Figure 1

Figure 2. Illustration of our dataset.

Figure 2

3.2. Stochastic Pooling

Pooling is an essential operation in standard convolutional neural networks (CNNs) [13]. Two types of pooling exist. One is max pooling (MP), and the other is average pooling (AP). The objective of pooling is to down-sample an input image or feature map (FM), reducing their dimensionality (width or height) and allowing for some assumption about the features to be made in each block.

Suppose we have an input image or FM, which can be split into O1 × O2 blocks, where every block has the extent of Q1 × Q2. Currently let us fix on the block Bo1,o2 at o1-th row and o2-th column as shown as the red rectangle in Figure 3.

Bo1,o2={b(x,y),x=1,,Q1,y=1,,Q2}. (2)

where 1 ≤ o1O1, 1 ≤ o2O2, b(x, y) means the pixel value at coordinate (x,y).

Figure 3. A diagram of block-wise pooling.

Figure 3

The strided convolution (SC) goes over the input activation map with the strides, that equals the size of the block (Q1, Q2). The output of SC is:

Bo1,o2SC=b(1,1). (3)

The l2-norm pooling (L2P), average pooling (AP) [14], and max pooling (MP) [15] produce the l2-norm, average, and maximum values within the block Bm1,m2, respectively. Their formula can be written as below:

Bo1,o2L2P=x=1Q1y=1Q2b2(x,y)Q1×Q2, (4)
Bo1,o2AP=1Q1×Q2x=1Q1y=1Q2b(x,y), (5)
Bo1,o2MP=maxx=1Q1maxy=1Q2b(x,y). (6)

Nevertheless, the AP outputs the average, downscaling the greatest value, where the important features may lie. In contrast, MP stores the greatest value but deteriorates the overfitting obstacle. In order to solve the above concerns, stochastic pooling (SP) [15] is introduced to provide a resolution to the drawbacks of AP and MP. SP is a four-step process.

Step 1, it produces the probability map (PM) VPM for each pixel in the block Bo1,o2.

{VPM(x,y)=b(x,y)x=1Q1y=1Q2b(x,y)s.t.x=1Q1y=1Q2VPM(x,y)=1 (7)

where VPM(x, y) stands for the PM value at pixel (x, y).

Step 2, it creates a random location vector (RLV) r=(xr,yr) that takes the discrete probability distribution (DPD) as

{P[r=(1,1)]=VPM(1,1)P[r=(1,2)]=VPM(1,2)P[r=(1,Q2)]=VPM(1,Q2)P[r=(2,1)]=VPM(2,1)Pr[r=(2,2)]=VPM(2,2)P[r=(2,Q2)]=VPM(2,Q2)P[r=(Q1,1)]=VPM(Q1,1)P[r=(Q1,2)]=VPM(Q1,2)P[r=(Q1,Q2)]=VPM(Q1,Q2) (8)

where P represents the probability. Shortly speaking, P[r=(x,y)]=VPM(x,y), ∀ 1 ≤ xQ1 & 1 ≤ y Q2 or rVPM, namely, the distribution of RLV r has the DPD as VPM.

Step 3, a sample location vector r0 is drawn from the RLV r, and we have

r0=(xr0,yr0) (9)

Step 4, SP outputs the pixel at the location r0, namely

Bo1,o2SP=b(xr0,yr0) (10)

Algorithm 1. Pseudocode of SP.

Input Block Bo1,o2.
Step 1 Produce the PM for each pixel VPM. See Eq. (7).
Step 2 Create a RLV r=(xr,yr). See Eq. (8).
Step 3 Draw a sample location vector r0 from the RLV r. See Eq. (9).
Step 4 Output pixel at location r0. See Eq. (10)
Output Matrix Bo1,o2SP.

Figure 4 shows a realistic example of four different pooling methods. Algorithm 1 presents the psuecodoe of SP. Take the 3x3 block B1,1 (The red rectangle in Figure 4) as an example, L2P generates the output as 6.98. AP and MP present 5.99 and 9.9, respectively. Meanwhile, SP first generates PM matrix,

VPM=[0.180.180.030.050.10.010.090.180.18] (11)

and a sample location vector is drawn as r0=(1,2). Therefore, the output of SP is B1,1SP=b(1,2)=9.8.

Figure 4. Comparison of four different pooling methods.

Figure 4

3.3. Structural Optimization

How to obtain the best network structure [16]? We try to design nine different configurations in this study. Their hyperparameters of structures are listed in Table 2. Two hyperparameters are considered in this study: (i) the number of Conv layers, and (ii) the number of fully connected layers (FCLs). Those two types of layers are common layers in standard CNN, so we will not introduce them due to the page limit.

Table 2. Structures of nine customized neural networks.

Configuration No. of Conv Layers No. of FCLs
I 2 1
II 2 2
III 2 3
IV 3 1
V 3 2
VI 3 3
VII 4 1
VIII 4 2
IX 4 3

(Bold means the best)

In the following experiment, we will observe that Configuration V, a five-layer customized neural network, gives the best performance. Here we briefly give its detailed structure in Table 3. The input is of size 256 × 256 × 1. The first Conv layer (Conv_1) is associated with the batch normalization (BN) layer and rectified linear unit (ReLU) activation. The parameters of Conv_1 are 32 kernels with sizes of 3 × 3 and stride of 2. Afterward, the first SP (SP_1) reduce the FM from 128 × 128 × 32 to 64 × 64 × 32.

Table 3. Detailed Structure of Network of Configuration V.

Layer Parameters FM
Input  256 × 256 × 1
Conv_1 (BN-ReLU) 32, 3x3 /2 128 × 128 × 32
SP_1   64 × 64 × 32
Conv_2 (BN-ReLU) 64, 3x3 /2   32 × 32 × 64
SP_2   16 × 16 × 64
Conv_3 (BN-ReLU) 128, 3x3 16 × 16 × 128
SP_3   8 × 8 × 128
Flatten 8192
FCL_1 100x8192, 100x1 100
FCL_2 2x100, 2x1 2
Output

After three Conv layers and three SP layers, the size of FM is 8 × 8 × 128. It is then flattened to a vector of 8192 neurons. With two FCLs of 100 and 2 hidden neurons, the neural network finally outputs whether TOF or HC. In all, our model is termed structurally optimized stochastic pooling convolutional neural network (SOSPCNN). The FM plot is portrayed in Figure 5.

Figure 5. FM Plot.

Figure 5

3.4. Multiple-way Data Augmentation

The relatively small dataset (40+40=80 images) may bring the overfitting problem. To avoid overfitting, data augmentation (DA) [17] is a powerful tool because it can generate synthetic images on the training set [18]. Zhu, W. (2021) [19] presented an 18-way DA method and proved this 18-way DA works better than the traditional DA approach. Its diagram is shown in Figure 6. The difference of DA and MDA is that (i) MDA uses a combination of different DA methods on training set; (ii) MDA is modular design. The users are easy to add or remove particular DA methods from a MDA.

Figure 6. Diagram of Multiple-way Data Augmentation.

Figure 6

Suppose we have the raw training image r(k), where k represents the image index. First, P1 different DA methods displayed in Figure 6 are applied to r(k). Let Zp, p = 1, …, P1 be each DA operation, we get P1 augmented datasets on raw image r(k) as:

Zp[r(k)],p=1,,P1. (12)

Let P2 stand for the size of generated new images for each DA method, thus,

|Zp[r(k)]|=P2. (13)

Second, horizontal mirrored image is produced by:

r(h)(k)=β1[r(k)] (14)

where β1 means horizontal mirror function.

Third, all P1 different DA methods are carried out on the mirrored image r(h)(k), and produce P1 new datasets as.

{Zp[r(h)(k)],p=1,,P1|Zp[r(h)(k)]|=P2,p=1,,P1 (15)

Fourth, the raw image r(k), the mirrored image r(h)(k), all P1-way results of raw image Zp[r(k)], and all P1-way DA results of horizontal mirrored image Zp[r(h)(k)], are combined. The final generated dataset from r(k) is defined as G(k):

r(k)G(k)=β2{r(k)r(h)(k)Z1[r(k)]P2Z1[r(h)(k)]P2ZP1[r(k)]P2ZP1[r(h)(k)]P2} (16)

where β2 stands for the concatenation function. Let augmentation factor be P3, which means the number of images in G(k), we obtain

P3=|G(k)||r(k)|=(1+P1×P2)×21=2×P1×P2+2 (17)

Algorithm 2 recaps the pseudocode of this 18-way DA. We set P1 = 9 to achieve an 18-way DA. We also set P2 = 30, thus P3 = 542, indicating each raw training image will generate 542 images, which include the raw image r(k) itself.

Algorithm 2. Pseudocode of proposed 18-way DA on k-th training image.

Input Import raw preprocessed k-th training image r(k).
P1 geometric or photometric or noise-injection DA transforms Zp are utilized on r(k).
Step 1 We obtain Zp[r(k)], p = 1, …, P1. See Eq. (12)
Each enhanced dataset contains P2 new images. See Eq. (13).
Step 2 A horizontal mirror image is produced as r(h)(k) = β1[r(k)]. See Eq. (14).
Step 3 M1-way data augmentation methods are carried out on r(h)(k),
we obtain Zp[r(h)(k)], p = 1, ⋯, P1. See Eq. (15).
Step 4 r(k), r(h)(k), Zp[r(k)], p = 1, … P1, and Zp[r(h)(k)], p = 1, ⋯, P1 are merged via β2. See Eq. (16).
Output A new dataset G(k) is produced. The number of images is P3 = 2 × P1 × P2 + 2. See Eq. (17).

3.5. Implementation and Grad-CAM

Q-fold cross-validation [20] is employed. The whole dataset is divided into Q folds (See Figure 7). At q-th trial, 1 ≤ qQ, the q-th fold is picked up as the test, and the rest Q – 1 folds: [1, …, q – 1, q + 1, …, Q] are chosen as training set [21]. In this study, we set Q = 10, namely a 10-fold cross validation. Furthermore, we run the 10-fold cross-validation 10 times, i.e. 10x10-fold cross-validation.

Figure 7. Q-fold cross-validation.

Figure 7

Gradient-weighted class activation mapping (Grad-CAM) [22] is employed to explain how our model makes its decision in classification. Grad-CAM utilizes the gradient of the classification score with respect to the convolutional features determined by the network to understand which parts of the image are most important for classification. Grad-CAM is a generalization of the class activation mapping (CAM) method [23] to a broader range of CNN models since the original CAM relies on a fully convolutional neural network structure. The output of SP_3 (See Table 3) is used as the feature layer for Grad-CAM.

Mathematically, suppose our classification network is with output yc, standing for the score for class c. We would like to compute the Grad-CAM map for a layer with k feature maps Ai,jk, where (i, j) stands for the indexes of pixels. We can obtain the neural importance weight as

αkc=1NijycAi,jk (18)

Where N stands for the total number of pixels in the feature map, The Grad-CAM is a weighted combination of the feature maps with a ReLU as

M=ReLU(kαkcAk) (19)

The Grad-CAM map M is then upsampled to the size of input data.

3.6. Measures

The confusion matrix of 10 runs of 10-fold cross-validation is supposed to be

G=[g(1,1)g(1,2)g(2,1)g(2,2)]=[TPFNFPTN], (20)

Note FN = FP = 0 for a perfect classification. The meaning of P, N, TP, FP, TN, and FN itemized in Table 4.

Table 4. Meanings in measures.

Abbreviation Full Form Symbol Meaning
P Positive TOF
N Negative HC
TP True Positive  g(1,1) TOF images are classified correctly.
FP False Positive  g(2,1) HC images are wrongly classified as TOF.
TN True Negative  g(2,2) HC images are classified correctly.
FN False Negative  g(1,2) TOF images are wrongly classified as HC.

Nine measures are used: sensitivity, specificity, precision, accuracy, F1 score, Matthews correlation coefficient (MCC), Fowlkes–Mallows index (FMI), receiver operating characteristic (ROC), and area under the curve (AUC).

The first four measures are defined as

{Sen=g(1,1)g(1,1)+g(1,2)Spc=g(2,2)g(2,2)+g(2,1)Prc=g(1,1)g(1,1)+g(2,1)Acc=g(1,1)+g(2,2)g(1,1)+g(2,2)+g(1,2)+g(2,1) (21)

F1, MCC [24], and FMI [25] are defined as:

{F1=2×Sen×PrcSen+Prc=2×g(1,1)2×g(1,1)+g(1,2)+g(2,1)Mcc=g(1,1)×g(2,2)g(2,1)×g(1,2)[g(1,1)+g(2,1)]×[g(1,1)+g(1,2)]×[g(2,2)+g(2,1)]×[g(2,2)+g(1,2)]FMI=Sen×Prc=g(1,1)g(1,1)+g(1,2)×g(1,1)g(1,1)+g(2,1) (22)

The above measures are calculated in the mean and standard deviation (MSD) format. Furthermore, ROC is a curve to measure a binary classifier with varying discrimination thresholds [26]. The ROC curve is created by plotting the sensitivity against 1-specificity. The AUC is calculated based on the ROC curve [27].

4. Experimental Results

4.1. Statistical Analysis

The result of the SOSPCNN model using Configuration V is itemized in Table 5. The model arrives at a performance with a sensitivity of 92.25 ± 2.19, a specificity of 92.75 ± 2.49, a precision of 92.79 ± 2.29, an accuracy of 92.50 ± 1.18, an F1 score of 92.48 ± 1.17, an MCC of 85.06 ± 2.38, and an FMI of 92.50 ± 1.17.

Table 5. Statistical Analysis of SOSPCNN model.

Run Sen Spc Prc Acc F1 MCC FMI
1 95.00 92.50 92.68 93.75 93.83 87.53 93.83
2 92.50 90.00 90.24 91.25 91.36 82.53 91.36
3 95.00 92.50 92.68 93.75 93.83 87.53 93.83
4 90.00 92.50 92.31 91.25 91.14 82.53 91.15
5 90.00 95.00 94.74 92.50 92.31 85.11 92.34
6 90.00 97.50 97.30 93.75 93.51 87.75 93.58
7 92.50 95.00 94.87 93.75 93.67 87.53 93.68
8 92.50 90.00 90.24 91.25 91.36 82.53 91.36
9 90.00 92.50 92.31 91.25 91.14 82.53 91.15
10 95.00 90.00 90.48 92.50 92.68 85.11 92.71
MSD 92.25
±2.19
92.75
±2.49
92.79
±2.29
92.50
±1.18
92.48
±1.17
85.06
±2.38
92.50
±1.17

Figure 8 shows the confusion matrix of 10x10-fold cross-validation, where we can see the TP = 369, FN = 31, TN = 371, and FP = 29, indicating 31 TOF are wrongly classified as HC while 29 HC are misclassified to TOF. Hence, the sensitivity is 369/(369 + 31) = 92.25% and specificity is 371/(29 + 371) = 92.75%.

Figure 8. Confusion matrix of 10x10-fold cross-validation (Here Class 1 & 2 stand for ToF and HC, respectively).

Figure 8

4.2. Configuration Comparison

We compare nine configurations (See Table 2). The validation is the same as previous experiment. Due to the page limit, the detailed statistical analysis is not shown. The ROC and AUC values are displayed in Figure 9. The AUC values of nine networks with different configurations are: 0.9502, 0.9511, 0.9504, 0.9532, 0.9587, 0.9577, 0.9360, 0.9419, and 0.9389, as shown in Figure 10. We can observe from Figure 10 that the best network is with Configuration V, whose structure is shown in Table 3.

Figure 9. Comparison of Nine Configurations.

Figure 9

Figure 10. Bar plot of AUC against Nine Configurations.

Figure 10

4.3. Effect of Multiple-way Data Augmentation

Figure 11 shows the multiple-way DA results if we take Figure 2(a) as the raw training examples. Due to the page limit, the multiple-way DA results on the horizontally mirrored image are not displayed. As we can see from Figure 11, multiple-way DA increases the diversity of the training images.

Figure 11. Illustration of Multiple-way Data Augmentation.

Figure 11

If we remove the multiple-way data augmentation from our model, the performances are decreased, as shown in Table 6, where MSD stands for mean and standard deviation. Comparing Table 5 with Table 6, we can observe multiple-way DA is efficient in improving the classification performance. The reason is that it helps our model resist overfitting by enhancing the diversity of the training set.

Table 6. Statistical Analysis without Multiple-way Data Augmentation.

Run Sen Spc Prc Acc F1 MCC FMI
1 87.50 90.00 89.74 88.75 88.61 77.52 88.61
2 90.00 87.50 87.80 88.75 88.89 77.52 88.90
3 87.50 87.50 87.50 87.50 87.50 75.00 87.50
4 90.00 87.50 87.80 88.75 88.89 77.52 88.90
5 85.00 90.00 89.47 87.50 87.18 75.09 87.21
6 85.00 90.00 89.47 87.50 87.18 75.09 87.21
7 82.50 92.50 91.67 87.50 86.84 75.38 86.96
8 87.50 87.50 87.50 87.50 87.50 75.00 87.50
9 85.00 90.00 89.47 87.50 87.18 75.09 87.21
10 82.50 92.50 91.67 87.50 86.84 75.38 86.96
MSD 86.25
±2.70
89.50
±1.97
89.21
±1.58
87.87
±0.60
87.66
±0.82
75.86
±1.16
87.70
±0.79

4.4. Explainability

Figure 12 shows the manual delineation and the heatmap of Figure 2(a) via Grad-CAM described in Section 3.5. The manual delineation showed the radiologist make decisions on "TOF" diagnosis based on all the areas of the abnormal heart, while the heatmap shows the proposed SOSPCNN model also put more focus on the heart region other than the surrounding tissues and background areas.

Figure 12. Heatmap of one TOF image.

Figure 12

4.5. Comparison with State-of-the-art Approaches

We compare the proposed SOSPCNN model with three other approaches: MC [8], 3DCNN [9], and VCCNN [10]. The results are shown in Table 7. Note that some comparison methods are not suitable for our dataset, so we modify them to adapt to our dataset.

Table 7. Comparison with state-of-the-art approaches.

Approach Sen Spc Prc Acc F1 MCC FMI
MC [8] 86.25
±3.58
80.75
±3.55
81.88
±2.32
83.50
±0.79
83.92
±0.97
67.25
±1.56
84.00
±0.98
3DCNN [9] 91.00
±3.16
89.50
±3.29
89.77
±2.70
90.25
±1.42
90.32
±1.42
80.63
±2.79
90.35
±1.41
VCCNN [10] 90.75
±1.69
90.00
±2.36
90.14
±1.95
90.38
±0.84
90.41
±0.78
80.80
±1.64
90.43
±0.76
SOSPCNN (Ours) 92.25
±2.19
92.75
±2.49
92.79
±2.29
92.50
±1.18
92.48
±1.17
85.06
±2.38
92.50
±1.17

The error bar comparison is drawn in Figure 13, which clearly shows that the proposed SOSPCNN outperforms all three comparative approaches. The reason is three folds: (i) We use stochastic pooling to replace traditional max-pooling; (ii) We use structural optimization to determine the optimal structure of our SOSPCNN model; (iii) Multiple-way DA is included to increase the diversity of training images. In the future, more advanced tqchniques [2830] will be tested and integrated into our model.

Figure 13. Error Bar Comparison.

Figure 13

4.6. Desktop and Web Apps

MATLAB app designer is used to create a professional application for both desktop and web. The input to this web app is any cardiovascular CT image, and the aforementioned SOSPCNN model is included in our app. Figure 14(a) displays the graphical user interface (GUI) of the standalone desktop app. The users can upload their custom images, and the software can show the results by turning the knob into the correct texts: TOF, HC, or None.

Figure 14. GUI of developed apps.

Figure 14

Figure 14(b) shows the GUI of the web app that is accessed through a "Google Chrome" web browser. The web app is based on a client-server modeled structure [31], i.e., the user is provided services through an off-site server hosted by a third-party cloud service, Microsoft Azure in our study. Our developed online web app can assist hospital clinicians in making decisions remotely and effectively.

5. Conclusion

This paper proposes a web app for TOF recognition. Our proposed model is termed structurally optimized stochastic pooling convolutional neural network (SOSPCNN) with explainable property achieved by Grad-CAM. The results by ten runs of 10-fold cross-validation show that this SOSPCNN model yields a sensitivity of 92.25±2.19, a specificity of 92.75±2.49, a precision of 92.79±2.29, an accuracy of 92.50±1.18, an F1 score of 92.48±1.17, an MCC of 85.06±2.38, an FMI of 92.50±1.17, an AUC of 0.9587. Further, we develop both desktop and web apps to realize this SOSPCNN model.

The shortcomings of our method are as follows: (i) Our model is trained on a small dataset; (ii) Our model does not go through strict medical verification; (iii) Our model only considers TOF and HC.

Therefore, we shall attempt to solve the above three weak points in the future. We shall try to collect more TOF and HC cardiovascular CT images. We shall invite clinicians to use our web app and return feedbacks so that we can continue to improve our model. We shall try to collect data of other heart diseases, so make our model can identify more types of diseases.

Acknowledgement

This paper is partially supported by Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Medical Research Council Confidence in Concept Award, UK (MC_PC_17171); Hope Foundation for Cancer Research, UK (RM60G0680); British Heart Foundation Accelerator Award, UK; Sino-UK Industrial Fund, UK (RP202G0289); Global Challenges Research Fund (GCRF), UK (P202PF11).

Footnotes

Conflict of Interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Contributor Information

Shui-Hua Wang, Email: sw546@le.ac.uk.

Kaihong Wu, Email: pumcwu@sina.com.

Tianshu Chu, Email: t.chu6688@gmail.com.

Steven L. Fernandes, Email: stevenfernandes@creighton.edu.

Qinghua Zhou, Email: qz106@le.ac.uk.

References

  • 1.Carli D, et al. Atypical microdeletion 22q11.2 in a patient with tetralogy of Fallot. Journal of Genetics. 2021;100(1):4.:5. [PubMed] [Google Scholar]
  • 2.Ghaderian M, et al. Clinical Outcome of Right Ventricular Outflow Tract Stenting Versus Blalock-Taussig Shunt in Tetralogy of Fallot: A systematic Review and Meta-Analysis. Current Problems in Cardiology. 2021;46(3):14.:100643. doi: 10.1016/j.cpcardiol.2020.100643. [DOI] [PubMed] [Google Scholar]
  • 3.Uecker M, et al. Gravitational Autoreposition for Staged Closure of Omphaloceles. European Journal of Pediatric Surgery. 2020;30(1):45–50. doi: 10.1055/s-0039-1693727. [DOI] [PubMed] [Google Scholar]
  • 4.Cambronero-Cortinas E, et al. Predictors of atrial tachyarrhythmias in adults with congenital heart disease. Kardiologia Polska. 2020;78(12):1262–1270. doi: 10.33963/KP.15644. [DOI] [PubMed] [Google Scholar]
  • 5.Ashraf T, et al. Coronary Artery Anomalies in Tetralogy of Fallot Patients Undergoing CT Angiography at a Tertiary Care Hospital. Cureus. 2020;12(9):7.:e10723. doi: 10.7759/cureus.10723. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Engbersen M, et al. Comparison of whole-body MRI and 68Ga-DOTATATE PET-CT findings in patients with suspected peritoneal metastases from neuroendocrine tumors. Journal of Neuroendocrinology. 2021;33:120. [Google Scholar]
  • 7.Shan F, et al. Abnormal lung quantification in chest CT images of COVID-19 patients with deep learning and its application to severity prediction. Medical Physics. 2021;13 doi: 10.1002/mp.14609. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ye DH, et al. In: Morphological Classification: Application to Cardiac MRI of Tetralogy of Fallot, in Functional Imaging and Modeling of the Heart. Metaxas DN, Axel L, editors. Springer-Verlag Berlin; Berlin: 2011. pp. 180–187. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Giannakidis A, et al. In: Yetongnon K, et al., editors. Fast Fully Automatic Segmentation of the Severely Abnormal Human Right Ventricle from Cardiovascular Magnetic Resonance Images using a Multi-scale 3D Convolutional Neural Network; 2016 12th International Conference on Signal-Image Technology & Internet-Based Systems; New York. 2016. pp. 42–46. [Google Scholar]
  • 10.Tandon A, et al. Retraining Convolutional Neural Networks for Specialized Cardiovascular Imaging Tasks: Lessons from Tetralogy of Fallot. Pediatric Cardiology. 2021;12 doi: 10.1007/s00246-020-02518-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Klimo M, et al. Deep Neural Networks Classification via Binary Error-Detecting Output Codes. Applied Sciences-Basel. 2021;11(8):18.:3563 [Google Scholar]
  • 12.Alfio VS, et al. Influence of Image TIFF Format and JPEG Compression Level in the Accuracy of the 3D Model and Quality of the Orthophoto in UAV Photogrammetry. Journal of Imaging. 2020;6(5):22.:30. doi: 10.3390/jimaging6050030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Schmid S, et al. A new approach for automated measuring of the melt pool geometry in laser-powder bed fusion. Progress in Additive Manufacturing. 2021;6(2):269–279. [Google Scholar]
  • 14.Hegde S, et al. PIG-Net: Inception based deep learning architecture for 3D point cloud segmentation*. Computers & Graphics-UK. 2021;95:13–22. [Google Scholar]
  • 15.Vrzal T, et al. DeepRel: Deep learning-based gas chromatographic retention index predictor. Analytica Chimica Acta. 2021;1147:64–71. doi: 10.1016/j.aca.2020.12.043. [DOI] [PubMed] [Google Scholar]
  • 16.Pokhrel J, et al. Statistical model for fragility estimates of offshore wind turbines subjected to aero-hydro dynamic loads. Renewable Energy. 2021;163:1495–1507. [Google Scholar]
  • 17.Jung KC, et al. Advanced deep learning model-based impact characterization method for composite laminates. Composites Science and Technology. 2021;207:10.:108713 [Google Scholar]
  • 18.Rahman A, et al. A machine learning framework for predicting the shear strength of carbon nanotube-polymer interfaces based on molecular dynamics simulation data. Composites Science and Technology. 2021;207:8.:108627 [Google Scholar]
  • 19.Zhu W. ANC: Attention Network for COVID-19 Explainable Diagnosis Based on Convolutional Block Attention Module. Computer Modeling in Engineering & Sciences. 2021 doi: 10.32604/cmes.2021.015807. [DOI] [Google Scholar]
  • 20.Akbari H, et al. Classification of normal and depressed EEG signals based on centered correntropy of rhythms in empirical wavelet transform domain. Health Information Science and Systems. 2021;9(1):15.:9. doi: 10.1007/s13755-021-00139-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Rajapandy M, et al. An improved unsupervised learning approach for potential human microRNA-disease association inference using cluster knowledge. Network Modeling and Analysis in Health Informatics and Bioinformatics. 2021;10(1):16.:21 [Google Scholar]
  • 22.Selvaraju RR, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision. 2020;128(2):336–359. [Google Scholar]
  • 23.Zhou B, et al. Learning Deep Features for Discriminative Localization; IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Las Vegas, NV, USA. 2016. pp. 2921–2929. [Google Scholar]
  • 24.Alahmadi A, et al. An explainable algorithm for detecting drug-induced QT-prolongation at risk of torsades de pointes (TdP) regardless of heart rate and T-wave morphology. Computers in Biology and Medicine. 2021;131:21.:104281. doi: 10.1016/j.compbiomed.2021.104281. [DOI] [PubMed] [Google Scholar]
  • 25.Coipan CE, et al. Concordance of SNP- and allele-based typing workflows in the context of a large-scale international Salmonella Enteritidis outbreak investigation. Microbial Genomics. 2020;6(3):14.:000318. doi: 10.1099/mgen.0.000318. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ali I, et al. A validation study of the 4-variable and 8-variable kidney failure risk equation in transplant recipients in the United Kingdom. Bmc Nephrology. 2021;22(1):8.:57. doi: 10.1186/s12882-021-02259-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wubalem A. Landslide susceptibility mapping using statistical methods in Uatzau catchment area, northwestern Ethiopia. Geoenvironmental Disasters. 2021;8(1):21.:1 [Google Scholar]
  • 28.Zhang Y-P, et al. Alzheimer’s disease multiclass diagnosis via multimodal neuroimaging embedding feature selection and fusion. Information Fusion. 2021;66:170–183. [Google Scholar]
  • 29.Zhang YP, et al. Clustering by transmission learning from data density to label manifold with statistical diffusion. Knowledge-Based Systems. 2020;193:14.:105330 [Google Scholar]
  • 30.Zhang YP, et al. Fast Exemplar-Based Clustering by Gravity Enrichment Between Data Objects. IEEE Transactions on Systems Man Cybernetics-Systems. 2020;50(8):2996–3009. [Google Scholar]
  • 31.Salama HM, et al. CSMCSM: Client-Server Model for Comprehensive Security in MANETs. International Journal of Information Security and Privacy. 2021;15(1):44–64. [Google Scholar]

RESOURCES