Improving convolutional neural networks performance for image classification using test time augmentation: a case study using MURA dataset

Ibrahem Kandel; Mauro Castelli

doi:10.1007/s13755-021-00163-7

. 2021 Jul 31;9(1):33. doi: 10.1007/s13755-021-00163-7

Improving convolutional neural networks performance for image classification using test time augmentation: a case study using MURA dataset

Ibrahem Kandel ^1,^✉, Mauro Castelli ¹

PMCID: PMC8325732 PMID: 34349982

Abstract

Bone fractures are one of the main causes to visit the emergency room (ER); the primary method to detect bone fractures is using X-Ray images. X-Ray images require an experienced radiologist to classify them; however, an experienced radiologist is not always available in the ER. An accurate automatic X-Ray image classifier in the ER can help reduce error rates by providing an instant second opinion to the emergency doctor. Deep learning is an emerging trend in artificial intelligence, where an automatic classifier can be trained to classify musculoskeletal images. Image augmentations techniques have proven their usefulness in increasing the deep learning model's performance. Usually, in the image classification domain, the augmentation techniques are used during training the network and not during the testing phase. Test time augmentation (TTA) can increase the model prediction by providing, with a negligible computational cost, several transformations for the same image. In this paper, we investigated the effect of TTA on image classification performance on the MURA dataset. Nine different augmentation techniques were evaluated to determine their performance compared to predictions without TTA. Two ensemble techniques were assessed as well, the majority vote and the average vote. Based on our results, TTA increased classification performance significantly, especially for models with a low score.

Keywords: Image classification, Convolutional neural networks, Transfer learning, Test time augmentation, Deep learning, Ensemble learning

Introduction

Musculoskeletal X-ray images are crucial for fracture classification. Usually, when a patient has an accident or suspects a fracture, the patient goes to the emergency room (ER), where the ER doctor will first do an X-ray to detect fractures. The misclassification rate of X-ray images in ER is very high due to several factors, like the fact that the ER doctor classifying the X-ray is not an experienced radiologist and the rapidness of the process that leads to mistakes [1]. An automatic classifier to assist the doctor in classifying X-ray images can be a great help and can reduce the error rate [2]. Deep learning is a subfield of artificial intelligence composed mainly of artificial neural networks (ANN). Convolutional neural networks (CNN) are ANN with at least one convolution layer. Due to its robustness and its state-of-the-art (SOTA) results, it becomes the default classifier for the computer vision domain. To accurately train a CNN, usually, enormous image datasets are required. In the medical field, it is usually impossible to find a dataset with millions of images. Many methods were introduced in the literature to tackle this problem, like using transfer learning [3–6] or image augmentation techniques. Image augmentation uses several iterations from the same image to increase the dataset's size and train the model on different image transformations. Geometric transformations, among other techniques, were introduced. Usually, image augmentation is used for image classification during the training time but not during the prediction time (test time). Test time augmentation (TTA) refers to the usage of image augmentation techniques during prediction time to increase the models' robustness.

As pointed out by Shorten and Khoshgoftaar [7], image augmentation can help in unbalanced problems by increasing the number of observations in underrepresented classes [8–10]. Other authors used image augmentation during the training phase to increase classifier performance [11, 12]. Rane et al. [13] investigated the effect of ensemble learning on classifying histopathology images. They used TTA to improve the model robustness and they applied the same nine augmentation techniques for training and testing. The authors averaged the results of the TTA operations into one final score. However, they reported only the results using TTA but did not report the model's results without TTA, thus making it difficult to understand the potential of TTA. Wang et al. [14] used TTA to estimate the model's uncertainty for segmenting fetal brain images. They reported that TTA did improve the segmentation results as long as it can calculate the segmentation model uncertainty.

Amiri et al. [15] used TTA to improve the performance of breast image segmentation. In their work, they applied a shifting augmentation technique with values ranging from − 25 pixels to + 25 pixels. Experimental results showed that TTA provides a robust method to determine the stability of the detector. Sigurthorsdottir et al. [16] used TTA to increase CNN's and RNN's performance in classifying ECG signals. They used ten different augmentation techniques and then took the average of these results as a final score. They reported that TTA did improve the results of the model compared to the model without TTA. Wang et al. [17] used TTA to improve the segmentation of brain tumor images. They considered flipping, rotation, and scaling as augmentation techniques and they tested the effect of TTA on 3D UNet, WNet, and cascaded networks. In all the experiments, TTA did improve the results compared to the same models without TTA.

Typically, the TTA is used in image segmentation, and as far as we know, there are very few studies that thoroughly studied the effect of TTA specifically for image classification.

In this paper, we investigated the usage of TTA for increasing the performance of image classification. To do so, we applied nine different geometric techniques and assessed their performance. Also, we combined the prediction of these nine transformations by using average voting and majority voting techniques.

The rest of the paper is organized as follows: Sect. 2 presents the methodology and the dataset used. In Sect. 3, we present the results achieved. In Sect. 4, we present a discussion about the results obtained. In Sect. 5, we conclude the paper by summarizing the main findings of this work.

Methodology

In this section, we discuss the methods used in this paper.

Convolutional Neural Networks

Convolutional neural networks (CNNs) have become the de-facto algorithm for many computer vision tasks in recent years. One of the many advantages of CNNs is the concept of weight sharing, where instead of connecting all neurons (like in fully connected neural networks), a kernel can be used to map the features. Using weight sharing decreases the network size significantly and make it more robust against overfitting. The convolution operation is the operation that distinguishes CNNs from others neural networks. The convolution is a linear mathematical operation where a kernel (filter) is used to map the input by multiplying the inputs by a set of weights. The convolution operation result is a feature map that will be used instead of the input. The convolution operation is shown in Eq. (1):

O [i, j] = F (u, v) * I (i, j) = \sum_{u}^{} \sum_{v}^{} \sum_{c \in \{R, G, B\}} F_{c} (u, v) ⊙ I_{c} (i + u, j + v)

where $I (.)$ is the input image, $c$ is the color channels, $F (u, v)$ is the kernel, and $O (i, j)$ is the output feature map in the $(i, j)$ position.

There have been many architectures that were introduced in the literature. In this paper, we will use the following SOTA CNN: VGG19, InceptionV3, ResNet50, Xception, and DenseNet121. All the CNNs used were pre-trained on the ImageNet dataset.

VGG19

A group of researchers introduced VGG CNN [18] from Oxford university to participate in the ImageNet challenge in 2014, and it achieved second place. VGG consists of several convolution blocks separated by a dropout layer. Each convolution block consists of three or four convolutional layers sequentially connected. Several versions were introduced by the authors of the VGG, which varies in the number of convolution layers. We will use the VGG19 version.

InceptionV3

InceptionV3 CNN [19] was introduced by a group of researchers from Google to participate in the ImageNet Challenge in the same year as VGG. InceptionV3 achieved a higher result than VGG and achieved first place. One of the main differences between them is the inception module's presence to decrease the computational power needed and capture different aspect ratios from the same image. There are several versions of the Inception networks. In this paper, we will use the InceptionV3 version.

ResNet50

ResNet CNN [20] was introduced by a group of researchers from Microsoft to participate in the ImageNet challenge in 2015, and it achieved first place in that year. One of the main perks of ResNet that separated it from other networks is the presence of residual connections. The authors have noticed that the performance deteriorates rapidly by increasing the CNN's depth, mainly because of the vanishing gradient problem. The authors proposed a connection that will act as a shortcut connection that will escape several layers each time. There are several versions of the ResNet networks. In this paper, we will use the ResNet50 version.

Xception

Xception CNN [21] was introduced by Francois Chollet in 2017. The Xception network was inspired by both the Inception module and the residual connection. The author replaced the conventional convolution layer with a depthwise convolution layer, which significantly decreased the computational power needed to train the network. The performance obtained by the Xception network for the ImageNet dataset is better than both the VGG19 and InceptionV3 and comparable to ResNet50.

DenseNet121

DenseNet CNN [22] was introduced in 2017. The residual connection from ResNet CNN inspired the DenseNet network. In DenseNet CNN, to overcome the vanishing gradient problem and reuse the subsequent convolution layers' features, the authors densely connected the subsequent convolution layers. The authors concatenated the results instead of adding them like the authors of ResNet. The result of DenseNet for the ImageNet dataset is higher than all the CNN mentioned above.

Test time augmentation

Test time augmentation (TTA) refers to the use of many variants of images during test time to provide different predictions for the same image [7, 23]. The results of the TTA can be combined in various ways, like taking the average vote or taking the majority vote of all the variants. Nine different augmentation techniques were studied in this paper, namely, horizontal flip; vertical flip; $40 %$ zooming; $180^{\circ}$ rotation; horizontal flip with vertical flip (H_V); horizontal flip with rotation (H_R); vertical flip with rotation (V_R); horizontal flip, vertical flip, and rotation (H_V_R); and combining all four methods, horizontal flip, vertical flip, rotation, and zooming (H_V_R_Z). We studied each technique's results alone and two combination methods: average votes and majority votes. The average vote considers the average of the scores obtained by a CNN network after the augmentation techniques (i.e., in our case, the average of nine values), and it outputs the predicted label based on this value. On the other hand, in the case of the majority vote, the predictions (obtained by each augmentation technique) for each label are summed, and the label with the majority vote is predicted. Thus, the former combination strategy considers the scores of the CNNs, while the latter technique works by directly considering the predicted labels.

Dataset and the evaluation metric

The dataset used in this paper is the MURA dataset [24], a publicly available dataset composed of X-Ray images of seven different upper extremities organs, namely, finger, wrist, hand, forearm, elbow, humerus, and shoulder. The size of the images is different and ranges from $117 \times 512$ pixels to $512 \times 512$ pixels. The authors of the dataset divided the images into two partitions: the training dataset and the testing dataset. The training dataset has 36,808 images, and the testing dataset has a total of 3,197 images. The MURA dataset is considered particularly challenging because of the inconsistency of the image's sizes, the presence of unbalanced classes in some organ datasets, and the small size of other organ datasets. A summary statistic about the MURA dataset is shown in Table 1. The evaluation metric proposed by the authors of the dataset is the Kappa metric [25]. The Kappa metric is a prevalent metric specially used for imbalanced classification problems. The Kappa metric ranges from [− 1,1] where $- 1$ means a completely random classifier and $+ 1$ means a perfect classifier. Kappa metric will be used to evaluate the results obtained to be consistent with other studies like [24, 26, 27]. The MURA dataset is available from http://arxiv.org/abs/1712.06957 (Fig. 1).

Table 1.

MURA dataset summary

Category	Training dataset		Test dataset
Category	Normal	Fractured	Normal	Fractured
Wrist	5765	3987	364	295
Shoulder	4211	4168	285	278
Hand	4059	1484	271	189
Finger	3138	1968	214	247
Elbow	2925	2006	235	230
Forearm	1164	661	150	151
Humerus	673	599	148	140
Total	21,935	14,873	1667	1530

Optimizer	Adam
Learning rate	0.0001
Loss function	Binary Cross-entropy
Early stopping	50 epochs
Batch size	64
Validation split	20%

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	0.3944	0.3550	0.3705	0.3891	0.3840
Horizontal	0.3883 ± 0.32%	0.3475 ± 0.50%	0.3623 ± 0.49%	0.4009 ± 0.43%	0.3751 ± 0.39%
Vertical	0.3813 ± 0.37%	0.3388 ± 0.46%	0.3702 ± 0.20%	0.4086 ± 0.41%	0.4110 ± 0.51%
Rotate	0.4333 ± 0.48%	0.4273 ± 0.59%	0.4402 ± 0.63%	0.4883 ± 0.61%	0.4586 ± 0.57%
Zoom	0.4220 ± 0.67%	0.4646 ± 0.67%	0.4458 ± 0.68%	0.4497 ± 0.71%	0.4483 ± 0.72%
H_V_R_Z	0.4000 ± 0.65%	0.4246 ± 0.75%	0.4328 ± 0.64%	0.4607 ± 0.67%	0.4487 ± 0.79%
H_V	0.3765 ± 0.45%	0.3454 ± 0.67%	0.3572 ± 0.69%	0.4161 ± 0.64%	0.3875 ± 0.54%
H_R	0.4273 ± 0.65%	0.4371 ± 0.67%	0.4423 ± 0.76%	0.4774 ± 0.64%	0.4646 ± 0.57%
V_R	0.4313 ± 0.65%	0.4346 ± 0.67%	0.4522 ± 0.76%	0.4811 ± 0.62%	0.4505 ± 0.58%
H_V_R	0.4328 ± 0.62%	0.4251 ± 0.70%	0.4470 ± 0.75%	0.4748 ± 0.73%	0.4616 ± 0.61%
Average vote	0.4291 ± 0.37%	0.4546 ± 0.41%	0.4624 ± 0.29%	0.5055 ± 0.36%	0.4712 ± 0.31%
Majority vote	0.4240 ± 0.39%	0.4525 ± 0.38%	0.4570 ± 0.35%	0.4955 ± 0.39%	0.4682 ± 0.39%

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	0.6387	0.6114	0.5784	0.5964	0.5686
Horizontal	0.6604 ± 0.31%	0.6095 ± 0.58%	0.5725 ± 0.67%	0.6118 ± 0.48%	0.5459 ± 0.71%
Vertical	0.6449 ± 0.41%	0.6182 ± 0.67%	0.6222 ± 0.69%	0.5784 ± 0.56%	0.5967 ± 0.58%
Rotate	0.6405 ± 0.59%	0.6040 ± 0.72%	0.5975 ± 0.93%	0.6126 ± 0.82%	0.6109 ± 0.69%
Zoom	0.6317 ± 0.75%	0.6282 ± 0.82%	0.5924 ± 0.62%	0.5970 ± 0.76%	0.5826 ± 0.75%
H_V_R_Z	0.6295 ± 0.65%	0.6090 ± 1.02%	0.5819 ± 0.81%	0.6121 ± 0.73%	0.5985 ± 0.94%
H_V	0.6552 ± 0.49%	0.6153 ± 0.69%	0.5980 ± 0.63%	0.5897 ± 0.65%	0.5897 ± 0.81%
H_R	0.6477 ± 0.63%	0.6058 ± 0.94%	0.5990 ± 0.84%	0.6177 ± 0.69%	0.6211 ± 0.69%
V_R	0.6473 ± 0.56%	0.6042 ± 0.83%	0.5935 ± 0.80%	0.6120 ± 0.68%	0.6213 ± 0.81%
H_V_R	0.6377 ± 0.58%	0.6058 ± 0.55%	0.5954 ± 0.81%	0.6090 ± 0.84%	0.6189 ± 0.81%
Average vote	0.6792 ± 0.33%	0.6677 ± 0.50%	0.6464 ± 0.48%	0.6424 ± 0.39%	0.6466 ± 0.44%
Majority vote	0.6835 ± 0.31%	0.6612 ± 0.55%	0.6449 ± 0.41%	0.6411 ± 0.47%	0.6460 ± 0.58%

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	0.5552	0.5219	0.5750	0.4956	0.5420
Horizontal	0.5580 ± 0.49%	0.5502 ± 0.58%	0.6039 ± 0.53%	0.5021 ± 0.51%	0.5396 ± 0.5458%
Vertical	0.5465 ± 0.43%	0.5430 ± 0.65%	0.5792 ± 0.64%	0.4879 ± 0.38%	0.5459 ± 0.53%
Rotate	0.5493 ± 0.48%	0.5370 ± 0.85%	0.5154 ± 0.86%	0.4811 ± 0.65%	0.5187 ± 0.75%
Zoom	0.5122 ± 0.89%	0.5092 ± 0.86%	0.5293 ± 0.78%	0.4770 ± 0.84%	0.4837 ± 0.85%
H_V_R_Z	0.5214 ± 0.77%	0.5102 ± 0.99%	0.4955 ± 0.91%	0.4824 ± 0.96%	0.4802 ± 0.84%
H_V	0.5542 ± 0.53%	0.5664 ± 0.76%	0.5877 ± 0.63%	0.4895 ± 0.54%	0.5327 ± 0.63%
H_R	0.5461 ± 0.70%	0.5414 ± 0.82%	0.5284 ± 0.63%	0.4933 ± 0.74%	0.5235 ± 0.69%
V_R	0.541 ± 0.59%	0.5329 ± 0.86%	0.5150 ± 0.62%	0.4945 ± 0.70%	0.5161 ± 0.91%
H_V_R	0.5387 ± 0.54%	0.5391 ± 0.87%	0.5253 ± 0.75%	0.4911 ± 0.74%	0.5216 ± 0.78%
Average vote	0.5536 ± 0.34%	0.5760 ± 0.43%	0.5844 ± 0.42%	0.5143 ± 0.36%	0.5520 ± 0.40%
Majority vote	0.5599 ± 0.33%	0.5713 ± 0.56%	0.5899 ± 0.48%	0.5111 ± 0.46%	0.5526 ± 0.43%

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	0.5749	0.6064	0.5551	0.6235	0.5250
Horizontal	0.5780 ± 0.31%	0.6101 ± 0.36%	0.5771 ± 0.34%	0.6246 ± 0.34%	0.5182 ± 0.29%
Vertical	0.5685 ± 0.30%	0.6138 ± 0.33%	0.5669 ± 0.32%	0.6182 ± 0.36%	0.5630 ± 0.32%
Rotate	0.6188 ± 0.45%	0.6174 ± 0.47%	0.6158 ± 0.41%	0.6085 ± 0.48%	0.6130 ± 0.44%
Zoom	0.6024 ± 0.45%	0.5929 ± 0.44%	0.5982 ± 0.44%	0.5946 ± 0.48%	0.5724 ± 0.49%
H_V_R_Z	0.5983 ± 0.39%	0.6063 ± 0.43%	0.5973 ± 0.50%	0.5998 ± 0.61%	0.5961 ± 0.60%
H_V	0.5700 ± 0.43%	0.6115 ± 0.46%	0.5873 ± 0.42%	0.6269 ± 0.38%	0.5496 ± 0.51%
H_R	0.6175 ± 0.40%	0.6134 ± 0.49%	0.6078 ± 0.43%	0.6074 ± 0.43%	0.6090 ± 0.56%
V_R	0.6194 ± 0.47%	0.6146 ± 0.49%	0.6068 ± 0.43%	0.6084 ± 0.42%	0.6116 ± 0.47%
H_V_R	0.6205 ± 0.41%	0.6158 ± 0.41%	0.6101 ± 0.46%	0.6114 ± 0.46%	0.6153 ± 0.54%
Average vote	0.6359 ± 0.24%	0.6487 ± 0.27%	0.6324 ± 0.27%	0.6382 ± 0.25%	0.6223 ± 0.27%
Majority vote	0.6334 ± 0.33%	0.6453 ± 0.29%	0.6292 ± 0.23%	0.6394 ± 0.25%	0.6159 ± 0.31%

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	0.6078	0.6252	0.5908	0.6336	0.6123
Horizontal	0.6137 ± 0.30%	0.6027 ± 0.41%	0.5720 ± 0.38%	0.6438 ± 0.33%	0.6121 ± 0.35%
Vertical	0.6207 ± 0.30%	0.6244 ± 0.45%	0.5635 ± 0.47%	0.6587 ± 0.37%	0.6219 ± 0.41%
Rotate	0.6235 ± 0.49%	0.6253 ± 0.68%	0.6317 ± 0.65%	0.6665 ± 0.46%	0.6431 ± 0.56%
Zoom	0.5835 ± 0.60%	0.6146 ± 0.68%	0.6208 ± 0.71%	0.6181 ± 0.55%	0.6141 ± 0.64%
H_V_R_Z	0.6094 ± 0.47%	0.6067 ± 0.68%	0.6152 ± 0.71%	0.6573 ± 0.67%	0.6238 ± 0.55%
H_V	0.6181 ± 0.33%	0.6161 ± 0.63%	0.5586 ± 0.48%	0.6611 ± 0.52%	0.6158 ± 0.53%
H_R	0.6214 ± 0.44%	0.6376 ± 0.66%	0.6247 ± 0.66%	0.6653 ± 0.58%	0.6470 ± 0.54%
V_R	0.6205 ± 0.51%	0.6277 ± 0.55%	0.6277 ± 0.65%	0.6664 ± 0.49%	0.6494 ± 0.50%
H_V_R	0.6217 ± 0.42%	0.6363 ± 0.50%	0.6206 ± 0.65%	0.6668 ± 0.55%	0.6481 ± 0.60%
Average vote	0.6330 ± 0.28%	0.6810 ± 0.39%	0.6560 ± 0.33%	0.6937 ± 0.33%	0.6793 ± 0.31%
Majority vote	0.6308 ± 0.27%	0.6722 ± 0.37%	0.6473 ± 0.36%	0.6947 ± 0.35%	0.6748 ± 0.36%

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	0.4032	0.3762	0.3579	0.3741	0.3118
Horizontal	0.4022 ± 0.37%	0.3771 ± 0.40%	0.3685 ± 0.57%	0.3810 ± 0.39%	0.3175 ± 0.38%
Vertical	0.4076 ± 0.32%	0.3641 ± 0.39%	0.3676 ± 0.36%	0.4058 ± 0.40%	0.3064 ± 0.39%
Rotate	0.4488 ± 0.54%	0.3987 ± 0.65%	0.3785 ± 0.45%	0.4044 ± 0.56%	0.3995 ± 0.58%
Zoom	0.4067 ± 0.76%	0.3991 ± 0.74%	0.3599 ± 0.60%	0.3653 ± 0.67%	0.3664 ± 0.67%
H_V_R_Z	0.4244 ± 0.69%	0.4011 ± 0.65%	0.3654 ± 0.76%	0.3825 ± 0.67%	0.3942 ± 0.75%
H_V	0.4016 ± 0.52%	0.3675 ± 0.53%	0.3737 ± 0.62%	0.4106 ± 0.53%	0.3044 ± 0.63%
H_R	0.4466 ± 0.54%	0.4051 ± 0.57%	0.3789 ± 0.51%	0.4090 ± 0.50%	0.4077 ± 0.51%
V_R	0.4475 ± 0.56%	0.4140 ± 0.65%	0.3798 ± 0.54%	0.4066 ± 0.59%	0.4029 ± 0.61%
H_V_R	0.4462 ± 0.51%	0.4045 ± 0.62%	0.3858 ± 0.44%	0.4049 ± 0.47%	0.3981 ± 0.53%
Average vote	0.4539 ± 0.28%	0.4330 ± 0.35%	0.4018 ± 0.26%	0.4206 ± 0.28%	0.3916 ± 0.32%
Majority vote	0.4542 ± 0.35%	0.4274 ± 0.37%	0.3996 ± 0.30%	0.4164 ± 0.33%	0.3910 ± 0.30%

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	0.4357	0.3693	0.3203	0.4187	0.4013
Horizontal	0.4469 ± 0.36%	0.3492 ± 0.40%	0.3389 ± 0.41%	0.4066 ± 0.38%	0.4192 ± 0.37%
Vertical	0.4494 ± 0.42%	0.3872 ± 0.38%	0.3294 ± 0.46%	0.4367 ± 0.41%	0.4332 ± 0.43%
Rotate	0.4661 ± 0.65%	0.4814 ± 0.56%	0.4760 ± 0.61%	0.5056 ± 0.53%	0.4898 ± 0.56%
Zoom	0.4418 ± 0.58%	0.4675 ± 0.51%	0.4653 ± 0.58%	0.4895 ± 0.73%	0.4518 ± 0.59%
H_V_R_Z	0.4484 ± 0.73%	0.4703 ± 0.62%	0.4495 ± 0.67%	0.5020 ± 0.71%	0.4829 ± 0.66%
H_V	0.4531 ± 0.51%	0.3633 ± 0.54%	0.3354 ± 0.60%	0.4289 ± 0.50%	0.4397 ± 0.48%
H_R	0.4623 ± 0.52%	0.4802 ± 0.64%	0.4654 ± 0.64%	0.5024 ± 0.64%	0.4915 ± 0.61%
V_R	0.4657 ± 0.54%	0.4817 ± 0.67%	0.4672 ± 0.62%	0.5101 ± 0.60%	0.4940 ± 0.47%
H_V_R	0.4642 ± 0.73%	0.4831 ± 0.66%	0.4650 ± 0.74%	0.5066 ± 0.69%	0.4862 ± 0.61%
Average vote	0.4845 ± 0.32%	0.4977 ± 0.35%	0.4608 ± 0.43%	0.5221 ± 0.40%	0.5129 ± 0.36%
Majority vote	0.4825 ± 0.36%	0.4944 ± 0.40%	0.4693 ± 0.48%	0.5230 ± 0.51%	0.5015 ± 0.47%

Dataset	Average vote (%)	Majority vote (%)
FINGER	22.87	21.53
HUMERUS	9.75	9.56
FOREARM	3.48	3.60
WRIST	10.48	9.96
ELBOW	8.91	8.14
HAND	15.60	14.92
SHOULDER	28.47	28.20

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	4.20 s	7.10 s	6.20 s	5.55 s	7.50 s
Horizontal	4.43 s ± 0.20	7.60 s ± 0.37	6.33 s ± 0.37	5.91 s ± 0.43	7.71 s ± 0.28
Vertical	4.61 s ± 0.42	7.63 s ± 0.34	6.53 s ± 0.48	6.06 s ± 0.35	7.85 s ± 0.37
Rotate	5.58 s ± 0.37	8.71 s ± 0.28	7.66 s ± 0.65	7.47 s ± 0.40	8.76 s ± 0.21
Zoom	5.92 s ± 0.53	8.83 s ± 0.27	7.65 s ± 0.47	7.38 s ± 0.48	8.87 s ± 0.33
H_V_R_Z	6.18 s ± 0.51	11.64 s ± 0.64	7.08 s ± 0.34	6.67 s ± 0.47	8.60 s ± 0.31
H_V	4.57 s ± 0.33	7.65 s ± 0.34	6.25 s ± 0.46	5.77 s ± 0.39	7.83 s ± 0.21
H_R	5.82 s ± 0.40	8.98 s ± 0.38	7.47 s ± 0.49	7.30 s ± 0.41	8.74 s ± 0.26
V_R	5.83 s ± 0.41	9.05 s ± 0.44	8.81 s ± 0.66	7.69 s ± 0.52	11.74 s ± 0.52
H_V_R	5.63 s ± 0.58	8.52 s ± 0.31	7.40 s ± 0.70	6.69 s ± 0.36	8.76 s ± 0.27

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	2.50 s	4.50 s	4.25 s	3.10 s	4.75 s
Horizontal	2.67 s ± 0.11	4.62 s ± 0.20	4.35 s ± 0.23	3.25 s ± 0.11	5.02 s ± 0.24
Vertical	2.64 s ± 0.08	4.62 s ± 0.16	4.49 s ± 0.24	3.32 s ± 0.22	4.90 s ± 0.20
Rotate	3.40 s ± 0.13	5.30 s ± 0.16	5.33 s ± 0.30	3.96 s ± 0.19	5.65 s ± 0.29
Zoom	3.37 s ± 0.14	5.32 s ± 0.28	5.66 s ± 0.50	4.02 s ± 0.20	5.70 s ± 0.32
H_V_R_Z	3.97 s ± 0.38	8.25 s ± 0.60	5 s ± 0.38	3.87 s ± 0.11	5.49 s ± 0.24
H_V	2.68 s ± 0.09	4.75 s ± 0.24	4.35 s ± 0.25	3.33 s ± 0.12	4.90 s ± 0.21
H_R	3.45 s ± 0.10	5.16 s ± 0.15	5.39 s ± 0.35	3.95 s ± 0.19	5.64 s ± 0.26
V_R	3.54 s ± 0.12	5.30 s ± 0.33	6.12 s ± 0.29	4.98 s ± 0.27	8.52 s ± 0.45
H_V_R	3.42 s ± 0.13	5.24 s ± 0.25	4.90 s ± 0.25	3.83 s ± 0.15	5.53 s ± 0.21

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	2.80 s	4.50 s	4.20 s	3.0 s	5.0 s
Horizontal	3.07 s ± 0.15	4.78 s ± 0.19	4.96 s ± 0.68	3.33 s ± 0.16	5.09 s ± 0.21
Vertical	3.07 s ± 0.18	4.80 s ± 0.21	4.76 s ± 0.53	3.24 s ± 0.10	5.13 s ± 0.23
Rotate	3.84 s ± 0.13	5.47 s ± 0.17	5.43 s ± 0.49	3.79 s ± 0.13	5.75 s ± 0.18
Zoom	3.96 s ± 0.16	5.39 s ± 0.17	5.27 s ± 0.44	3.87 s ± 0.16	5.75 s ± 0.20
H_V_R_Z	4.14 s ± 0.33	7.90 s ± 0.52	5.07 s ± 0.25	3.79 s ± 0.13	5.69 s ± 0.19
H_V	2.96 s ± 0.13	4.83 s ± 0.17	4.82 s ± 0.57	3.28 s ± 0.13	5.17 s ± 0.21
H_R	3.94 s ± 0.15	5.46 s ± 0.20	5.64 s ± 0.59	3.81 s ± 0.12	5.80 s ± 0.29
V_R	3.92 s ± 0.20	5.47 s ± 0.22	6.23 s ± 0.38	4.95 s ± 0.28	8.97 s ± 0.61
H_V_R	3.86 s ± 0.18	5.44 s ± 0.20	5.36 s ± 0.55	3.81 s ± 0.11	5.75 s ± 0.22

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	7.0 s	13.0 s	9.0 s	7.25 s	11.5 s
Horizontal	7.38 s ± 0.62	13.68 s ± 1.09	9.40 s ± 0.87	7.46 s ± 0.35	12.30 s ± 0.58
Vertical	7.04 s ± 0.37	14.11 s ± 1.79	9.47 s ± 0.93	7.63 s ± 0.38	12.60 s ± 0.42
Rotate	9.05 s ± 0.38	15.95 s ± 1.71	11.03 s ± 0.89	9.17 s ± 0.58	14.10 s ± 0.54
Zoom	9.17 s ± 0.50	15.53 s ± 1.07	11.52 s ± 1.05	9.38 s ± 0.62	14.66 s ± 0.63
H_V_R_Z	8.64 s ± 0.53	15.81 s ± 1.02	10.06 s ± 0.53	8.61 s ± 0.24	12.77 s ± 0.44
H_V	7.03 s ± 0.51	13.57 s ± 1.26	9.24 s ± 0.79	7.35 s ± 0.34	11.92 s ± 0.60
H_R	9.23 s ± 0.38	15.60 s ± 1.04	11.35 s ± 0.87	9.08 s ± 0.54	13.94 s ± 0.53
V_R	9.09 s ± 0.32	15.78 s ± 1.23	11.36 s ± 0.44	9.48 s ± 0.21	16.13 s ± 0.70
H_V_R	8.61 s ± 0.42	13.83 s ± 1.07	10.52 s ± 0.57	8.72 s ± 0.39	13.36 s ± 0.45

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	4.0 s	8.25 s	6.35 s	5.35 s	8.15 s
Horizontal	4.86 s ± 0.34	8.40 s ± 0.79	6.79 s ± 0.48	5.72 s ± 0.28	8.39 s ± 0.38
Vertical	4.88 s ± 0.31	8.46 s ± 0.93	6.82 s ± 0.54	5.76 s ± 0.37	8.62 s ± 0.33
Rotate	6.05 s ± 0.44	9.74 s ± 0.89	8.11 s ± 0.66	6.86 s ± 0.31	9.86 s ± 0.49
Zoom	6.03 s ± 0.35	10.29 s ± 1.02	8.15 s ± 0.70	6.84 s ± 0.47	10.33 s ± 0.75
H_V_R_Z	6.07 s ± 0.44	11.40 s ± 0.88	7.58 s ± 0.57	6.46 s ± 0.30	9.06 s ± 0.31
H_V	4.87 s ± 0.38	8.13 s ± 0.66	6.52 s ± 0.39	5.57 s ± 0.31	8.26 s ± 0.34
H_R	6.23 s ± 0.38	10.21 s ± 0.96	7.91 s ± 0.75	6.70 s ± 0.44	9.45 s ± 0.30
V_R	6.26 s ± 0.34	10.32 s ± 1.00	8.78 s ± 0.53	7.36 s ± 0.25	12.59 s ± 1.11
H_V_R	6.08 s ± 0.41	9.04 s ± 0.64	7.43 s ± 0.34	6.46 s ± 0.32	9.23 s ± 0.32

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	4.80 s	9.10 s	6.5 s	5.0 s	9.0 s
Horizontal	4.98 s ± 0.46	9.69 s ± 0.53	6.78 s ± 0.37	5.20 s ± 0.29	9.08 s ± 0.89
Vertical	5.10 s ± 0.44	10.01 s ± 0.61	7.07 s ± 0.51	5.16 s ± 0.35	9.42 s ± 1.11
Rotate	6.49 s ± 0.63	11.40 s ± 0.56	8.67 s ± 0.55	6.29 s ± 0.39	10.53 s ± 0.77
Zoom	6.57 s ± 0.59	11.36 s ± 0.64	8.90 s ± 0.46	6.43 s ± 0.38	10.61 s ± 1.02
H_V_R_Z	6.26 s ± 0.56	12.43 s ± 0.55	7.39 s ± 0.30	6.07 s ± 0.21	9.46 s ± 0.62
H_V	5.13 s ± 0.50	9.59 s ± 0.55	6.66 s ± 0.34	5.19 s ± 0.27	9.15 s ± 1.23
H_R	6.53 s ± 0.62	11.53 s ± 0.55	8.29 s ± 0.54	6.34 s ± 0.29	10.51 s ± 1.05
V_R	6.67 s ± 0.68	11.71 s ± 0.81	9.04 s ± 0.40	7.08 s ± 0.22	12.67 s ± 0.74
H_V_R	6.35 s ± 0.75	10.23 s ± 0.44	7.53 s ± 0.39	6.20 s ± 0.35	9.66 s ± 0.61

Technique	VGG19	InceptionV3	ResNet50	Xception	DenseNet121
Without TTA	6.0 s	9.50 s	7.5 s	6.25 s	10.50 s
Horizontal	6.46 s ± 0.43	10.08 s ± 0.70	7.97 s ± 0.34	6.46 s ± 0.32	11.02 s ± 0.49
Vertical	6.19 s ± 0.54	9.68 s ± 0.75	8.07 s ± 0.36	6.30 s ± 0.29	11.30 s ± 0.70
Rotate	8.35 s ± 0.60	11.60 s ± 1.01	9.31 s ± 0.40	7.59 s ± 0.44	13.49 s ± 0.93
Zoom	8.14 s ± 0.61	11.11 s ± 0.73	9.20 s ± 0.41	7.73 s ± 0.50	13.43 s ± 0.67
H_V_R_Z	7.82 s ± 0.57	10.67 s ± 0.71	8.73 s ± 0.38	7.51 s ± 0.30	11.74 s ± 0.50
H_V	6.43 s ± 0.46	9.70 s ± 0.69	8.17 s ± 0.33	6.43 s ± 0.27	10.81 s ± 0.39
H_R	8.26 s ± 0.64	10.84 s ± 0.68	9.40 s ± 0.44	7.67 s ± 0.43	13.23 s ± 0.86
V_R	8.56 s ± 0.69	12.92 s ± 1.01	10.33 s ± 0.88	8.47 s ± 0.34	15.31 s ± 0.97
H_V_R	7.57 s ± 0.56	10.86 s ± 0.71	8.93 s ± 0.44	7.47 s ± 0.30	12.08 s ± 0.43

PERMALINK

Improving convolutional neural networks performance for image classification using test time augmentation: a case study using MURA dataset

Ibrahem Kandel

Mauro Castelli

Abstract

Introduction

Methodology

Convolutional Neural Networks

VGG19

InceptionV3

ResNet50

Xception

DenseNet121

Test time augmentation

Dataset and the evaluation metric

Table 1.

Fig. 1.

Results

Table 2.

Fig. 2.

Finger images

Table 3.

Fig. 3.

Humerus images

Table 4.

Fig. 4.

Forearm images

Table 5.

Fig. 5.

Wrist images

Table 6.

Fig. 6.

Elbow images

Table 7.

Fig. 7.

Hand images

Table 8.

Fig. 8.

Shoulder images

Table 9.

Fig. 9.

Discussion

Table 10.

Conclusions

Acknowledgements

Appendix

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

Table 16.

Table 17.

Data availability

Declarations

Conflict of interest

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases