Abstract
Diabetes mellitus is a serious chronic disease that affects millions of people worldwide. In patients with diabetes, ulcers occur frequently and heal slowly. Grading and staging of diabetic ulcers is the first step of effective treatment and wound depth and granulation tissue amount are two important indicators of wound healing progress. However, wound depths and granulation tissue amount of different severities can visually appear quite similar, making accurate machine learning classification challenging. In this paper, we innovatively adopted the fine-grained classification idea for diabetic wound grading by using a Bilinear CNN (Bi-CNN) architecture to deal with highly similar images of five grades. Wound area extraction, sharpening, resizing and augmentation were used to pre-process images before being input to the Bi-CNN. Innovative modifications of the generic Bi-CNN network architecture are explored to improve its performance. Our research generated a valuable wound dataset. In collaboration with wound experts from University of Massachusetts Medical School, we collected a diabetic wound dataset of 1639 images and annotated them with wound depth and granulation tissue grades as labels for classification. Deep learning experiments were conducted using holdout validation on this diabetic wound dataset. Comparisons with widely used CNN classification architectures demonstrated that our Bi-CNN fine-grained classification approach outperformed prior work for the task of grading diabetic wounds.
Keywords: ound assessment, fine-grained classification, diabetic wounds, wound depth, wound granulation tissue amounts, deep learningound assessment, deep learningw
I. Introduction
Diabetes mellitus is a serious chronic disease that affects an estimated 425 million people worldwide (or 8.8% of the adult population) [1]. In the U.S. in 2015, about 23.1 million people of all ages (7.2% of the U.S. population) had diagnosed diabetes [2]. In diabetic populations, diabetic wounds occur easily due to reasons including higher frequency and intensity of mechanical changes in conformation of the bony architecture, peripheral neuropathy, and atherosclerotic peripheral arterial disease [3], [4]. Diabetic wounds have a lifetime prevalence estimated between 12% and 25% [2] and has a high recurrence rate between 7.8% [5] to 48.0% [6].
Diabetic wounds may take months to years to heal and require regular checkups by wound nurses who debride the wound, inspect its healing progress and recommend visits to wound experts when necessary. Consistent and accurate wound care is crucial for proper diabetic wound healing and delays in visiting a wound specialist could increase the risk of lower extremity amputation or even death [2]. However, a shortage of wound experts especially in rural areas can cause late diagnosis and poor wound care [7]. Moreover, unnecessary hospital visits increase the workload of clinicians and add an avoidable financial burden for patients. A smartphone photo-based wound assessment system that patients or visiting nurses can use in the patients’ homes is a promising solution to these problems.
Since 2011, our group has been researching and developing the Smartphone Wound Analysis and Decision-Support (SmartWAnDS) system, which will autonomously analyze wound images captured by patients’ smartphone cameras and generate wound care decisions. SmartWAnDS will support decisions made by wound nurses in remote locations, thus standardizing the care of diabetic wounds. The SmartWAnDS system would also enable patients get feedback anytime between visits, engaging them in their care. Grading and staging of diabetic ulcers is the first step of effective treatment, which has been shown to significantly affect and predict the wound’s outcome. Increases in ulcer grades have been found to correlate to increases in amputation rates [8], [9]. Consequently, our research group is focusing on an autonomous photo-based wound severity grading system.
Wound depth and granulation tissue amount are two important attributes that indicate the wound’s severity during grading. However, machine learning classification is challenging because wound depths and granulation tissue amount of different severities can appear quite similar (See Figures 1 and 2). Fine-grained image classification is an emerging intra-class image classification approach, which tries to recognize sub-categories in the same main category. These sub-categories usually look quite similar (e.g. recognizing different sub-types of flowers [23], [24], plants [25], [26], insects [27], [28], birds [29]–[38], dogs [39]–[42], vehicles [37], [43] and shoes [44]) and do not have obvious discriminative features such as different shape, color and texture, making classification challenging. In this paper, we utilize the fine grained neural networks approach to improve the accuracy of classifying wound depth and granulation tissue amounts attributes compared to prior work.
Fig. 1:
Example images with different wound depth scores: (a)-(e) score ranging from 0 to 4.
Fig. 2:
Example images with different granulation tissue amount scores: (a)-(e) score ranging from 0 to 4.
Our rubric for wound grading is the Photographic Wound Assessment Tool (PWAT)) [10]–[12], which has been proposed by wound assessment experts to enable novices accurately grade wounds and has been generally accepted as a standard for photo-based wound evaluation. PWAT uses eight criteria for grading wound healing: size, depth, necrotic tissue type, total amount of necrotic tissue, granulation tissue type, total amount of granulation tissue, edges and periulcer skin viability. Each PWAT criterion can be scored from 0 to 4 (good to bad), yielding a maximum total score of 32. Details of the PWAT assessment criteria for depth and total amount of granulation tissue are shown in Table I. Since PWAT scores for each attribute ranges from 0 to 4, we consider the grading of the wound depth and granulation tissue amount of diabetic wounds as a five-class image classification task.
TABLE I:
PWAT assessment rubric for wound depth and granulation tissue amount.
| Attribute | PWAT Scoring Rubric |
|---|---|
| Wound depth | 0. wound is healed (skin intact) or nearly closed (< 0.3cm2) |
| 1. full thickness | |
| 2. unable to judge because majority of wound base is covered by yellow/black eschar | |
| 3. full thickness involving underlying tissue layers | |
| 4. tendon, joint capsule, bone, visible/ present in wound base | |
| Granulation tissue amount | 0 = Wound is closed (skin intact) or nearly closed (< 0.3cm2) |
| 1 = 75%to 100% of open wound is covered with granulation tissue | |
| 2 = > 50% and < 75% of open wound is covered with granulation tissue | |
| 3 = 25% to 50% of wound bed is covered with granulation tissue | |
| 4 = < 25% of wound bed is covered with granulation tissue | |
Previous photo-based automatic wound assessment research mostly used traditional machine learning approaches with hand-crafted image descriptors such as color and textural features [13], color histograms [14], local binary patterns (LBP) [15], morphological and topological characteristics [16]. There has been little research into image-based wound depth evaluation. Acha et al extracted first-order statistical features [17] and colour and texture features [18] for wound diagnosis. In summary, these approaches all utilize handcrafted image descriptors with unsupervised approaches for classification, which may not effectively distinguish similar wound attribute sub-classes.
Following the success of deep neural network in many computer vision and image analysis tasks, they are increasingly being used for wound image analyses. Convolutional Neural Networks (CNNs) are the most widely used architectures for wound image analyses. In order to classify healthy skin and the wounds, CNN-based DFUNet [19] and LeNet [20] were proposed. CNN architectures performed well for wound tissue type classification [21], [22]. However, these prior deep learning methods adopted CNN architectures for classifying wound images into two or three very distinct classes (e.g. Skin vs wound vs background).In contrast to prior work, our goal was to classify wound depth and granulation tissue amount into five grades ranging from 0 to 4 based on the PWAT grading rubric. As wound depth and granulation tissue amount of different grades do not have obvious distinguishing visual characteristics, our classification task was challenging.
To solve these problems, we innovatively adopted a Bilinear CNN (Bi-CNN) architecture specifically designed for fine-grained classification for grading diabetic wounds into five classes. The main contributions of this paper are four-fold:
In collaboration with wound experts, we created a large diabetic wound image dataset that we then annotated with their corresponding wound depth and granulation tissue amount based on the PWAT rubric.
We innovatively applied a Bi-CNN fine-grained classification deep neural network to deal with the challenging task of recognizing different grades of wound depth and granulation tissue amount, which are highly similar.
We modified the generic Bi-CNN network architecture and adopted several pre-processing techniques to improve the Bi-CNN’s performance in automatic diabetic wound grading.
Our results show that our modified Bi-CNN outperformed other widely used CNN classification architectures, demonstrating that the fine-grained classification approach can significantly improve wound attribute classification accuracy.
To the best of our knowledge, this is the first work to classify wound depth and granulation tissue on a 5-point scale, and the first attempt to adopt state-of-the-art fine-grained classification deep neural networks for wound image analyses. Our experimental results showed that our proposed approach is promising for diabetic wound analyses.
The rest of this paper is organized as follows: Section 2 summarizes related work, highlighting their differences with our work. Our methodology is described in Section 3. Sections 4 presents the wound image data set utilized in our study and the implementation details of model training. Our analyses, results and findings are presented in Section 5. In Section 6 we discussed possible improvements and suggest directions for future work. Finally, in Section 7 we conclude our work.
II. Related Work
Traditional computer vision classification approaches based on manual feature extraction are not effective solutions for sub-classes that appear quite similar. Thus, we utilize fine-grained deep classification neural networks to learn latent discriminative features from our diabetic wound dataset. Nejati et al [48] is the only work we found using fine-grained deep neural networks for wound tissue classification. However, they utilized AlexNet, an image recognition architecture that is not specifically designed for fine-grained classification as their neural network for tissue classification. Also, they addressed the wound tissue classification problem, for which manually extracting features such as color, shape and texture can be an effective approach. In contrast, our depth and granulation tissue amount grading task is more challenging as the class differences cannot be captured by obvious visual features.
To the best of our knowledge, our work is the first that uses a deep neural network specifically designed for fine-grained classification for the analyses of fine classes of wound attribute grading.
A. Bilinear Convolutional Neural Network for wound grading (Bi-CNN)
The Bilinear Convolutional Neural Network (Bi-CNN) architecture was first proposed by Lin et al [37]. In their paper, the Bi-CNN performed well on a birds dataset with images in 200 categories, an aircraft dataset with 100 categories and cars dataset with 196 categories. The Bi-CNN consists of two parallel stream feature extractors based on CNNs whose outputs are multiplied using the outer product at each location of the image and pooled across locations to obtain a bi-linear vector as the learned image descriptor, which is followed by the fully connected layer.
In our work, we utilized the VGG16 architecture [49], which was pre-trained on more than 14 million images from ImageNet [50] as the basic network for both streams of our Bi-CNN architecture for wound grade classification. We also compared the Bi-CNN approach to the classic VGG16 network, which has 16 weight layers, 13 convolutional and 3 fully connected layers. Based on this architecture, we utilized five convolutional blocks for both streams and then the feature outputs were combined at each location using the matrix outer product with sum pooling.
The bilinear feature BF is calculated by:
| (1) |
Where L is the location of current pixel, IA and IB are input images, for our proposed method they are same. SA and SB are the two feature outputs extracted from Stream A and Stream B respectively. And then we adopted sum pooling to obtain the bilinear vector φ(I) by calculating:
| (2) |
The bilinear vector p = φ(I) obtained is then passed through a signed square root step , followed by l2 normalizations (r ← q/∥q∥2) that have been shown to improve the model’s performance in practice [37].
After generating the bilinear vector, a dropout layer is applied to avoid overfitting, followed by a soft-max layer. Fig. 3 is a detailed list of parameters of our proposed Bi-CNN architecture for wound grade classification. The outer product captures pairwise correlations between feature channels and can model part-feature interactions. For example, for the classification of wound depth grades, one of the networks is a part detector that locates edges of the wound and the other network is a local feature extractor that recognizes the depth of the wound. Thus, as this architecture can model local pairwise feature interactions [37], it is particularly useful for fine-grained categorization. The pipeline of our proposed Bi-CNN architecture for the grading of wound depth and granulation tissue amounts is shown in Fig. 4.
Fig. 3:
Proposed Bi-CNN architecture and parameters for wound grading.
Fig. 4:
The pipeline of Bi-CNN based diabetic wound grading architecture.
B. End-to-end training
Our network for wound severity grading can be trained using an end-to-end approach. All the parameters in the network were trained by back-propagating the gradients of the classification loss. We adopted the cross-entropy loss function in our experiments. Based on the chain rule, back propagation of gradients through bilinear pooling is shown in Fig. 5. dE/dSA and dE/dSB are the gradients of the loss function with regard to the feature outputs SA (from stream A ) and SB (from stream B) respectively. Thus, we have:
| (3) |
| (4) |
Fig. 5:
Back propagation of gradients through bilinear pooling.
For other layers, the gradients before bi-linear pooling and in the classification layer are straightforward and can be computed using the chain rule.
III. Materials and implementation details
Our proposed wound evaluation system consists of three major steps: image pre-processing, fine-grained neural network model training and then using the trained model for wound grading. Fig. 6 shows the flow diagram of our proposed wound evaluation system. In the following, we will describe the wound image dataset we utilized, after which we cover in detail each of the stages of our approach.
Fig. 6:
Flow diagram of the proposed wound evaluation system.
A. Wound image dataset
All the images we used for the study of wound evaluation systems were acquired in one of three ways. First, we acquired 114 wound images captured with a wound imaging box [51], which maintained a consistent, homogeneous lighting environment for imaging the wound. Second, 202 images were gathered from wound images publicly available on the Internet, which were mostly captured from a relatively perpendicular angle. Third, 1323 patient wound images collected at the University of Massachusetts Medical School (UMMS) were received after IRB approval for our use. These images had large variations in lighting, viewing angles, wound types and skin texture. In total, we gathered 1639 wound images of diabetics from these three sources for inclusion in our wound dataset. For all the experiments we conduct five-fold holdout validation, each fold had 1477 images for training and 162 images for testing. The number and percentage of images in each class are shown in Table II.
TABLE II:
Statistics of our collected data set.
| Score/Class | 0 | 1 | 2 | 3 | 4 | Total | |
|---|---|---|---|---|---|---|---|
| Wound depth | Train | 28 (1.9%) |
161 (10.9%) |
777 (52.6%) |
352 (23.8%) |
159 (10.8%) |
1477 |
| Test | 3 | 17 | 86 | 39 | 17 | 162 | |
| Granulation tissue amount | Train | 109 (7.4%) |
130 (8.8%) |
96 (6.5%) |
222 (15.0%) |
920 (62.3%) |
1477 |
| Test | 12 | 14 | 10 | 24 | 102 | 162 | |
B. Pre-processing
In our proposed method, we adopted four pre-processing steps to facilitate good feature extraction. One step was specific to our wound classification problem, while two others were standard pre-processing steps used to prepare the images before inputting them into deep networks. The details of our pre-processing steps are as follows.
1). Image patches for training:
As most of the original wound images had large background regions (an example is shown in Fig. 7 (a)), if such images are directly classified, the deep network will learn most of the visual features of the background and thus will not accurately learn wound features. Thus, in our proposed method, we first segmented the target wound region (segment) out using an annotation app proposed in our previous research [52] from which we generated a wound mask. A view of this annotation app is shown in Fig. 8.
Fig. 7:
An example of an image patch generation. (a)Captured wound image (b) Segmentation mask for wound (c) Wound image patch.
Fig. 8:
A view of annotation app. (a) Annotated wound image (b) Wound mask.
After segmentation, we derived a bounding box of the recognized wound area, and cropped the images using these bounding boxes to create wound image patches. Finally, all the image patches were cropped to dimensions of 256 ∗ 256 ∗ 3 pixels, and were sometimes resized again according to the needs of different deep networks in our experiments. An example of the image patch generation is shown in Fig 7.
2). Image enhancement:
In order to improve the performance of fine-grained deep neural network, we sharpened input wound images to enhance their features and make the textures of the wound image clearer. An example of a sharpened wound image is shown in Fig. 9.
Fig. 9:
Image sharpening process. (a) the original wound image patch. (b) the image patch after sharpening.
3). Resizing:
Since all pre-trained networks used in our proposed method and the approaches we compared against expect input images to be of a specific size during training, we resized all images to a standard dimension of 448∗448∗3 and subtracted the mean of the image before propagating it though the network.
4). Image augmentation:
While our dataset contained 1639 images, we needed more images in order to achieve a robust deep learning model. Image augmentation is a commonly used technique to enlarge the training dataset by creating variations of each image in the training dataset to yield more generalized deep neural networks. In our experiments, we created variants of each training image that were rotated by 90, 180 and 270 degrees’ as augmentations. Fig. 10 shows an augmentation example.
Fig. 10:
Sample image augmentations done during training (a) Original image (b)90° rotation (c)180° rotation (d)270° rotation.
C. Model training
We adopted a two-step model training strategy with transfer learning and fine-tuning steps, which are expounded on below.
1). Transfer learning on wound data set:
Transfer learning is a commonly used approach in deep learning where a model trained on one task is used as the starting point of a new related task. By learning a model transferred from pre-trained networks, we are able to take advantage of the abundant data utilized and attributes learned by the pre-trained network. Thus, we ran our wound images through the pre-trained networks and took the output of the FC layers as was done in some prior work [53], [54].
In our work, we first applied transfer learning by freezing all parameters of the convolution blocks and added a dropout layer and a five-way soft-max layer on top of the network for 5 wound grades and then trained the network. The network is trained by minimizing the cross-entropy loss, as shown in formula (5).
| (5) |
and
| (6) |
Where k is the number of image categories, tpj is the function that indicates the pth image belongs to class j.θ represents the parameters of the softmax classifier. ln(yj(xp, θ) is the network output of the pth image.
For transfer learning, we adopted 1e-8 as weight decay rate, batch size was set as 16 for training with a relatively high base learning rate as 1. The initial weights of the modified layer were generated using the Kaiming uniform approach [55]. After training for about 20–30 epochs, the Bi-CNN classification outputs became stable. We utilized Stochastic Gradient Descent with Momentum (SGDM) as the optimizer in all our experiments.
2). Fine-tuning the proposed wound grading deep neural network:
Simply applying transfer learning (i.e. using pre-trained networks without fine tuning) achieved relatively good classification results (shown in Table VI). We then hypothesized that fine-tuning the pre-trained networks using diabetic wound images would yield higher quality features from the images. We fine-tuned the entire model and all layers using back-propagation for about 20–30 epochs at a relatively small learning rate (0.0001), keeping all other input parameters the same with transfer learning. Our accuracy increased around 5% for the wound depth test dataset and 8% for granulation tissue amount test data set, which will be detailed in Section IV-D.
TABLE VI:
Results obtained by different Fully Connected (FC) layer architectures. WD represents Wound Depth and GTA stands for Granulation Tissue Amount.
| Architecture | Data set | WD | GTA |
|---|---|---|---|
| 3 FC layers | Train | 84.00% | 74.00% |
| Test | 79.27% | 70.37% | |
| 1 FC layers | Train | 86.00% | 82.00% |
| Test | 82.93% | 72.83% | |
IV. Results
A. Evaluation metrics
Our classification approach was evaluated using two performance measures: accuracy and weighted F1 score [56]. We also analyzed the classification performance of each class by showing the confusion matrix.
Accuracy is the ratio of the number of images accurately classified by the algorithm out of the total number of images in the test dataset, and is calculated as:
| (7) |
We also adopted weighted F1 score as an evaluation index which is defined as:
| (8) |
Where Pi and Ri stands for precision and recall of class i, which can be calculated by:
| (9) |
| (10) |
Where for these two metrics, TPi represents the number of True Positive of class i, FPi is the number of False Positives of class i, TNi for True Negative of class i and FNi for False Negative of class i. wi is the weight and calculated as the proportion of class i images in the test dataset.
B. Choosing the best dropout rate
In order to avoid over fitting in the training process and achieve a more generalized model, we added a dropout layer before the softmax layer. The dropout rate was chosen experimentally leading us to set the optimal dropout rate for wound depth images as 0.2, and for granulation tissue amount as 0.3. The performance of our Bi-CNN with different dropout rates are shown in Table III and Table IV. We selected the dropout rate that yielded a high test set accuracy with a relatively small accuracy gap between the train and test accuracies, which indicates that the model is not overfitting.
TABLE III:
The performance with different dropout rates on wound depth images.
| Dropout rate | 0.1 | 0.2 | 0.3 | 0.4 |
|---|---|---|---|---|
| Train accuracy | 97% | 91% | 85% | 78% |
| Test accuracy | 81% | 85% | 84% | 82% |
TABLE IV:
The performance with different dropout rates on granulation tissue amount.
| Dropout rate | 0.1 | 0.2 | 0.3 | 0.4 |
|---|---|---|---|---|
| Train accuracy | 98% | 94% | 88% | 79% |
| Test accuracy | 84% | 85% | 85% | 81% |
C. Influences on different fully connected layer constructions
We explored the influences of two different fully connected layer architectures and compared their classification accuracies. First, we adopted an architecture with 3 fully connected layers after bi-linear pooling, shown in Table V. The experimental results were obtained by setting the base learning rate as 0.01 with 60 epochs. Secondly, we explored an architecture using one Fully-Connected (FC) layer mapped to a softmax layer as was introduced in Section II-A.
TABLE V:
Parameters of 3 fully connected layers architecture.
| Type | Input size | Output size |
|---|---|---|
| FC | 262144 | 1000 |
| Dropout | (p=0.2) | |
| FC | 1000 | 1000 |
| Dropout | (p=0.2) | |
| FC | 1000 | 5 |
| Soft-max |
For comparison, the results are obtained by only re-training the Fully Connected (FC) layers with the convolutional layers frozen. The details are shown in Table VI.
From Table VI, we can see that simply using one FC layer yielded the best results. Hence, in our following experiments, we adopted a single FC layer architecture.
D. Accuracy and Stability analyses
To better evaluate the performance of our proposed wound grading network, in our experiments, we adopted five-fold holdout validation. We extracted five sets of test images, ensured no overlap between sets and equal proportions of the different grades of original diabetic wound images in each set. Our final results are the average classification accuracy of the five folds. Fig 11 shows a sample of our training accuracy trajectory and the loss history of the best performing model on five-fold holdout validation based on the depth and granulation tissue amount datasets respectively.
Fig. 11:
Training progress of depth and granulation tissue amount data set: the Accuracy and the loss history of best performing model on five-fold holdout validation. (a)-(b) are the results of depth dataset. (c)-(d) are the results of granulation tissue mount dataset.
Fig 11 shows that our proposed network for wound grading converges very fast, obtaining a stable accuracy and train loss after around 20 epochs, which also demonstrates good stability of the proposed method. The accuracy of the five-fold holdout validation experiment is shown in Table VII and analyzed using box plots shown in Fig 12. The best accuracy for wound depth and granulation tissue amount grading are both 84.6%.
TABLE VII:
Results of the five-fold holdout validation experiment using various evaluation metrics.
| Index | Label category | Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5 |
|---|---|---|---|---|---|---|
| Accuracy | Wound depth | 84.0% | 84.6% | 80.3% | 84.6% | 83.3% |
| Granulation tissue amount | 84.6% | 83.3% | 82.7% | 84.6% | 81.5% | |
| F1-score | Wound depth | 0.8367 | 0.8433 | 0.7933 | 0.8489 | 0.8290 |
| Granulation tissue amount | 0.8382 | 0.8229 | 0.8146 | 0.8378 | 0.8004 | |
Fig. 12:
Box plot of changes in accuracy across five folds.
To evaluate the classification accuracy of each class, we generated confusion matrices for the test set with the best accuracy for wound depth and granulation tissue amount classification, which are shown in Fig 13 and Fig 14.
Fig. 13:
Confusion matrix of wound depth dataset. The accuracy is 84.57%
Fig. 14:
Confusion matrix of granulation tissue amount dataset. The accuracy is 84.57%
In the confusion matrices, the numbers on diagonal line represent images that were correctly classified. We can see that the majority of test images are on diagonal line or near it. For our wound grading task, numbers above the diagonal line indicate that the wound severity has been over estimated, in which case patients will be recommended to visit a wound expert for an unnecessary examination and further treatment. Although such incorrectly scored images will increase costs to the health care system, they will not affect the patients’ health adversely. However, of greater concern are the numbers below the diagonal line in the confusion matrices, which indicate that the wound severity has been underestimated. Such patients may need to visit the wound clinic, but our system will not correctly assess this.
E. Error Analysis of Mis-Classified Wound Images
Next, we performed error analysis by qualitatively assessing the reason individual images were mis-classified. We discovered that unstable lights, blurring, low resolution and controversial labels were the most common causes of misclassification, accounting for about 30% of mis-classified images. Some examples of such images are shown in Fig 15 and Fig 16.
Fig. 15:
Examples of common causes of mis-classified wound depth image examples.
Fig. 16:
Examples of common causes of mis-classified granulation tissue amount image examples.
Following analyses, we believe that additional image pre-processing techniques could be used to deal with uncertain environments while taking photos, mitigating the effects of blurred, bad illumination, low resolution images on wound grading. Additionally, patients could be given picture-taking guidelines to improve the quality of wound images they take for assessment by our system and wound experts.
F. Comparing with other networks
We compared experimental results of our Bi-CNN with five CNN architectures: AlexNet [57], VGG16 [49] and two variations of ResNet [58] and Densenet [59] which have shown excellent performance in previous classification tasks. VGGNet was the runner-up in ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2014, while AlexNet and ResNet were the winners of the challenge in 2012 and 2015, respectively. Densenet paper won CVPR best paper award in 2017. By adopting the same holdout folds for test, we obtained the experimental results and the best accuracy of different CNN architectures are shown in Table VIII.
TABLE VIII:
The results of different CNN architectures. WD represents wound depth and GTA stands for granulation tissue amount.
| Methods | Accuracy | ||
|---|---|---|---|
| WD | GTA | ||
| w/o fine-tuning | VGG16 | 71.9% | 76.6% |
| Densenet | 71.9% | 69.8% | |
| Alexnet | 74.5% | 75.5% | |
| Bi-CNN | 82.9% | 76.8% | |
| w/fine-tuning | Resnet18 | 78.1% | 81.7% |
| Resnet50 | 79.3% | 81.7% | |
| Bi-CNN | 84.6% | 84.6% | |
By comparing our Bi-CNN approach with other CNN architectures that have shown excellent performance on other classification tasks, we see that adopting the fine-grained classification idea for the wound grading problem improved performance and demonstrated the effectiveness of our proposed approach.
V. Discussion and future work
A. Mitigating class imbalance
As shown in Table II, we noticed that the number of images in different classes are not well-balanced, as there are 52.6% wound depth images in grade 2, 62.3% wound granulation tissue amount images in grade 4. It is common practice to balance datasets in such cases using data augmentation to avoid unbalanced datasets. By adopting horizontal flip, vertical flip and translation, we augmented the dataset to balance all classes, before applying our proposed architecture. In our experiments, this data augmentation did not improve our experimental results. In fact, our test set accuracy dropped by around 2%. Therefore, we did not explicitly address class imbalance in our approach, nor does prior work by Matsunaga et al. [60]; Barata et al. [61]; Menegola et al. [62], the three top teams of the ISIC 2017 contest. In future work, collecting more diabetic wound images especially for classes with fewer images now will be important to facilitate robust model training.
B. Exploring Additional pre-processing techniques:
In order to obtain clearer images for training our model, pre-processing techniques for image deblurring, image super resolution and illumination correction will be another important direction.
C. Investigating the effects of downsampling wound images
The pre-trained VGG16 networks we utilized for both Bi-CNN streams of our proposed wound grading architecture required resizing images to a certain dimension. Resizing may have caused some valuable information to be lost during the down sampling step. So in the subsequent research, we will address this problem to reduce loss of information.
D. Fine-grained classification of more wound attributes
Adopting state-of-the-art fine-grained techniques on diabetic wound grading of other PWAT aspects such as size, necrotic tissue type, edges and periulcer skin viability will be addressed in our future research.
VI. CONCLUSION
In this paper, we proposed a fine-grained diabetic wound grading method based on the Bi-CNN deep neural network. To the best of our knowledge, this is the first attempt at using a fine-grained deep neural network for wound healing grade classification of five classes. We evaluated the wound healing grades for wound depth and granulation tissue amount guided by ground truth labels provided by wounds experts. We also adopted pre-processing techniques and modified the Bi-CNN architecture to better adapt to the diabetic wound grading task. In comparisons with other commonly used CNN networks, our experimental results show the effectiveness of using fine-grained deep neural network for the diabetic wound grading task.
The results of our proposed approach on diabetic wound grading reveal a promising direction for analyzing wound images that are highly similar and do not have obvious distinguishing visual features for classification. The generalization of the proposed approach for other medical imaging classification tasks is a subject for future work.
Acknowledgments
This work was supported in part by NIH/NIBIB under Grant 1R01EB025801-01. The authors also acknowledge financial support from China Scholarship Council (CSC).
Biography
Xixuan Zhao is currently pursuing the Ph.D. degree in Forestry Engineering with the Beijing forestry University, Beijing, China. Since November 2018, she has been a visiting scholar with Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA. Her research interests include deep learning and computer vision.
Ziyang Liu is currently pursuing the Ph.D. degree in Computer Science in Worcester Polytechnic Institute, MA, USA. His current research interests include computer vision and deep learning.
Emmanuel Agu received the Ph.D. degree in electrical and computer engineering from the University of Massachusetts Amherst, Amherst, MA, USA, in 2001. He is a Professor in the Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA. He has been involved in research in mobile and ubiquitous computing for over 16 years. He is currently working on mobile health projects to assist patients with diabetes, obesity, and depression.
Ameya Wagh received his MS degree in Robotics Engineering from Worcester Polytechnic Institute, Worcester, MA, USA in 2018. He currently works as a software engineer at TORC Robotics. His current research interests include computer vision and deep learning.
Shubham Jain received his MS degree in Robotics Engineering from Worcester Polytechnic Institute, Worcester, MA, USA in 2018. He currently works as a computer vision engineer at NVIDIA. His current research interests include autonomous driving and related computer vision problems.
Clifford Lindsay received the B.S. degree in Computer Science from University of California, San Diego, San Diego, CA, USA, in 2001, and received the Ph.D. degree in Computer Science from Worcester Polytechnic Institute, Worcester, MA, USA, in 2011. He is a assistant Professor at UMass Medical School. He currently working on applying computer vision and image processing methods to improve the quality of medical images.
Bengisu Tulu received her PhD in management of information systems and technology from Claremont Graduate University, CA, USA. She is an associate professor in the Foisie Business School at Worcester Polytechnic Institute, Worcester, MA, USA. She is one of the founding members of the Healthcare Delivery Institute at WPI. Her research interests include development and implementation of health information technologies and the impact of these implementations on healthcare organizations and consumers.
Diane Strong received the B.S. degree in mathematics and computer science from the University of South Dakota, Vermillion, SD, USA, in 1974, the M.S. degree in computer and information science from the New Jersey Institute of Technology, Newark, NJ, USA, in 1978, and the Ph.D. degree in information systems from the Tepper School of Business, Carnegie Mellon University, Pittsburgh, PA, USA, in 1989. Since 1995, she has been a Professor at Worcester Polytechnic Institute, Worcester, MA, USA, and is currently a Full Professor in the Foisie School of Business at WPI, where she is the Director of Information Technology Programs. She is a member of the Faculty Steering Committee of WPI’s Healthcare Delivery Institute. Her research has been concerned with effective use of IT in organizations and by individuals. Since 2006, she has focused on effectively using IT to promote health and support healthcare delivery.
Jiangming Kan received Ph.D. degree in forestry engineering from Beijing Forestry University, China in 2009. Currently, he is a professor in Beijing Forestry University. His research interests include computer vision and intelligent control.
References
- [1].Carracher AM, Marathe PH, and Close KL, ”International Diabetes Federation 2017,” J. Diabetes, vol. 10, no. 5, pp. 353–356, January. 2018. [DOI] [PubMed] [Google Scholar]
- [2].NIH’s National Diabetes Information Clearing House, National Institute of Health., 2011. [Online]. Available: www.diabetes.niddk.nih.gov.
- [3].Noha A and John D, ”Diabetic foot disease: From the evaluation of the ”foot at risk” to the novel diabetic ulcer treatment modalities,” World J. Diabetes, vol. 7, no. 7, pp. 153–164, April. 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [4].Naves CCLM, ”The Diabetic Foot: A Historical Overview and Gaps in Current Treatment,” Adv. Wound Care, vol. 5, no. 5, pp.191–197, May 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Lemaster JW, Reiber GE, Smith DG, Heagerty PJ, and Wallace C, ”Daily weight-bearing activity does not increase the risk of diabetic foot ulcers,” Med. Sci. Sports Exerc, vol. 35, no. 7, pp. 1093–1099, July. 2003. [DOI] [PubMed] [Google Scholar]
- [6].Kloos C, Hagen F, Lindloh C, Braun A, and Muller UA, ”Cognitive function is not associated with recurrent foot ulcers in patients with diabetes and neuropathy,” Diabetes Care, vol. 32, no. 5, pp. 894–896, July. 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Kirsner RS and Vivas AC, ”Lower-extremity ulcers: diagnosis and management,” Br. J. Dermatol, vol. 173, no. 2, pp. 379–390, August. 2015. [DOI] [PubMed] [Google Scholar]
- [8].Gul A, Basit A, Ali SM, Ahmadani MY, and Miyan Z, ”Role of wound classification in predicting the outcome of Diabetic Foot Ulcer,” J. Pak. Med. Assoc, vol. 56, no. 10, pp. 444–447, October. 2006. [PubMed] [Google Scholar]
- [9].Falanga V, Saap LJ, and Ozonoff A, ”Wound bed score and its correlation with healing of chronic wounds,” Dermatol. Ther, vol. 19, no. 6, pp. 383–90, November. 2006. [DOI] [PubMed] [Google Scholar]
- [10].Houghton PE, Kincaid CB, Campbell KE, Woodbury MG, and Keast DH, ”Photographic assessment of the appearance of chronic pressure and leg ulcers,” Ostomy Wound Manag, vol. 46, no. 4, pp. 20–6, 28–30, April. 2000. [PubMed] [Google Scholar]
- [11].Houghton PE, Kincaid CB, Lovell M, Campbell KE, and Harris KA, ”Effect of Electrical Stimulation on Chronic Leg Ulcer Size and Appearance,” Phys. Ther, vol. 83, no. 1, pp. 17–28, January. 2003. [PubMed] [Google Scholar]
- [12].Thawer A, Houghton PE, Woodbury MG, Keast D, and Campbell K, ”A comparison of computer-assisted and manual wound size measurement.,” Ostomy Wound Manag, vol. 48, no. 10, pp. 46–53, October. 2002. [PubMed] [Google Scholar]
- [13].Mukherjee R, Manohar DD, Das DK, Achar A, Mitra A, and Chakraborty C, ”Automated tissue classification framework for reproducible chronic wound assessment,” Biomed. Res. Int, vol. 2014, no. 2014, pp. 1–9, July. 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [14].Hani AFM, Arshad L, Malik AS, Jamil A, and Bin FYB, ”Assessment of chronic ulcers using digital imaging,” in 2011 National Postgraduate Conference, September. 2011, pp. 1–5. [Google Scholar]
- [15].Noguchi H, Kitamura A, Yoshida M, Minematsu T, Mori T, and Sanada H, ”Clustering and classification of local image of wound blotting for assessment of pressure ulcer,” in Proc. WAC, Hawaii, USA, 2014. [Google Scholar]
- [16].Tchendjou GT, Alhakim R, Simeu E, and Lebowsky F, ”Evaluation of machine learning algorithms for image quality assessment,” in Proc. IEEE IOLTS, Alava, Spain, July. 2016, pp. 193–194. [Google Scholar]
- [17].Tran H, Le T, Le T, and Nguyen T, ”Burn image classification using one-class support vector machine,” in ICCASA, April. 2015, pp. 233–242. [Google Scholar]
- [18].Acha B, Serrano C, Acha JI, and Roa LM, ”CAD tool for burn diagnosis,” Inf. Process Med. Imaging, vol. 18, pp. 294–305, July. 2003. [DOI] [PubMed] [Google Scholar]
- [19].Goyal M, Reeves ND, Davison AK, Rajbhandari S, Spragg J, and Yap MH, ”DFUNet: Convolutional Neural Networks for Diabetic Foot Ulcer Classification,” IEEE Trans. Emerg. Top. Comput. Intell, pp. 1–12, September. 2018. [Google Scholar]
- [20].Badea C, M. and Felea Iulian and Florea Laura and Vertan, ”The use of deep learning in image segmentation, classification and detection,” in Proc. CVPR., Las vegas, NV, USA, May 2016, pp. 1733–1740. [Google Scholar]
- [21].Zahia S, Sierra-Sosa D, Garcia-Zapirain B, and Elmaghraby A, ”Tissue classification and segmentation of pressure injuries using convolutional neural networks,” Comput. Methods Programs Biomed, vol. 2018, no. 159, pp. 51–58, March. 2018. [DOI] [PubMed] [Google Scholar]
- [22].Elmogy M, Garcia-Zapirain B, Burns C, Elmaghraby A, and Ei-Baz A, ”Tissues Classification for Pressure Ulcer Images Based on 3D Convolutional Neural Network,” Medical Biological Engineering Computing, vol. 56, no. 12, pp. 2245–2258, June. 2018. [DOI] [PubMed] [Google Scholar]
- [23].Nilsback ME and Zisserman A, ”A Visual Vocabulary for Flower Classification,” in Proc. CVPR, New York, NY, USA, June. 2006. [Google Scholar]
- [24].Xiaoling K, Cui X, and Bing N, ”Inception-v3 for flower classification,” in Proc. ICIVC, Chengdu, China, June. 2017, pp. 783–787. [Google Scholar]
- [25].Belhumeur PN, Chen D, Feiner S, Jacobs DW, and Ling Z, ”Searching the World’s Herbaria: A System for Visual Identification of Plant Species,” in Proc. ECCV, Marseille, France, pp 116–129, October. 2008. [Google Scholar]
- [26].Barre P, Stover BC, Muller KF, and Steinhage V, ”LeafNet: A computer vision system for automatic plant species identification,” Ecol. Inform, vol. 40, pp. 50–56, July. 2017. [Google Scholar]
- [27].Larios N, Soran B, Shapiro LG, Martinez-Munoz G, Lin J, and Dietterich TG, ”Haar Random Forest Features and SVM Spatial Matching Kernel for Stonefly Species Identification,” in Proc. ICPR, Istanbul, Turkey, August. 2010, pp. 2624–2627. [Google Scholar]
- [28].Martinez-Munoz G, Delgado NL, Mortensen EN, Wei Z, and Dietterich TG, ”Dictionary-free categorization of very similar objects via stacked evidence trees,” in Proc. CVPR, Miami, Florida, USA, June. 2009, pp. 549–556. [Google Scholar]
- [29].Berg T and Belhumeur PN, ”POOF: Part-Based One-vs.-One Features for Fine-Grained Categorization, Face Verification, and Attribute Estimation,” in Proc. CVPR, Portland, OR, USA, June. 2013, pp. 955–962. [Google Scholar]
- [30].Wen Y, Zhang K, Li Z, and Yu Q, ”A Discriminative Feature Learning Approach for Deep Face Recognition,” in Proc. ECCV, Amsterdam, The Netherlands, September. 2016, pp 499–515. [Google Scholar]
- [31].Berg T, Liu J, Lee SW, Alexander ML, Jacobs DW, and Belhumeur PN, ”Birdsnap: Large-Scale Fine-Grained Visual Categorization of Birds,” in Proc. CVPR, Columbus, OH, USA, June. 2014, pp. 2019–2026. [Google Scholar]
- [32].Zhang N, Donahue J, Girshick R, and Darrell T, ”Part-based R-CNNs for Fine-grained Category Detection.” in Proc. CVPR, Columbus, OH, USA, 2014, pp. 834–849. [Google Scholar]
- [33].Lazebnik S, Schmid C, and Ponce J, ”A maximum entropy framework for part-based texture and object recognition,” in Proc. ICCV, Nice, France, October. 2005, pp. 832–838. [Google Scholar]
- [34].Branson S, Van Horn G, Wah C, Perona P, and Belongie S, ”The Ignorant Led by the Blind: A Hybrid Human-Machine Vision System for Fine-Grained Categorization,” Int. J. Comput. Vision, vol. 108, no. 1–2, pp. 3–29. [Google Scholar]
- [35].Krause J, Gebru T, Deng J, Li LJ, and Li FF, ”Learning Features and Parts for Fine-Grained Recognition,” in Proc. ICPR, August. 2014, pp. 26–33. [Google Scholar]
- [36].Zhang N, Farrell R, Iandola F, and Darrell T, ”Deformable part descriptors for fine-grained recognition and attribute prediction,” in Proc. ICCV, Sydney, NSW, Australia, 2013, pp. 729–736. [Google Scholar]
- [37].Lin RY, Roychowdhury A, and Maji S, ”Bilinear CNN models for fine-grained visual recognition,” in Proc. CVPR, Boston, MA, USA, 2015, pp. 1449–1457. [Google Scholar]
- [38].Branson S, Van Horn G, Belongie S, and Perona P, ”Bird species categorization using pose normalized deep convolutional nets,” in Proc. BMVC, Nottingham, England, 2014. [Google Scholar]
- [39].Khosla A, Jayadevaprakash N, Yao B, and Li F-F, ”Novel Dataset for Fine-Grained Image Categorization: Stanford Dogs,” in Proc. CVPR, Columbus, OH, USA, June. 2014, pp. 1–2. [Google Scholar]
- [40].Liu J, Kanazawa A, Jacobs D, Belhumeur P ”Dog Breed Classification Using Part Localization,”. In Proc. ECCV, Florence, Italy, October. 2012. pp. 172–185. [Google Scholar]
- [41].Parkhi OM, Vedaldi A, Zisserman A, and Jawahar CV, ”Cats and dogs,” in Proc. CVPR, Providence, Rhode Island, June. 2012, pp. 3498–3505. [Google Scholar]
- [42].Gavves E, Fernando B, Snoek CGM, Smeulders AWM, and Tuytelaars T, ”Local Alignments for Fine-Grained Categorization,” Int. J. Comput. Vis, vol. 111, no. 2, pp. 191–212, January. 2015. [Google Scholar]
- [43].Krause J, Stark M, Deng J, and Fei-Fei L, ”3D object representations for fine-grained categorization,” in Proc. ICCV, Sydney, NSW, Australia, December. 2013, pp. 554–561. [Google Scholar]
- [44].Berg T, Berg AC, and Shih J, ”Automatic Attribute Discovery and Characterization,” in Proc. Eccv, Crete, Greece, September. 2010, pp.663–676. [Google Scholar]
- [45].Maji S, ”Discovering a lexicon of parts and attributes,” in Proc. Eccv, October. 2012, Berlin, Heidelberg, pp. 21–30. [Google Scholar]
- [46].Sudha A. R. Mohan, and Meher PK, ”A self-configurable systolic architecture for face recognition system based on principal component neural network,” IEEE Trans. Circuits Syst. Video Technol, vol. 21, no. 8, pp. 1071–1084, August. 2011. [Google Scholar]
- [47].Ruiz-Garcia A, Elshaw M, Altahhan A, and Palade V, ”A hybrid deep learning neural approach for emotion recognition from facial expressions for socially assistive robots,” Neural Comput. Appl, vol. 29, no. 7, pp. 359–373, April. 2018. [Google Scholar]
- [48].Nejati H, Ghazijahani HA, Abdollahzadeh M, Malekzadeh T, and Lian LL, ”Fine-grained wound tissue analysis using deep neural network,” in Proc. ICASSP, April. 2018, pp. 1010–1014. [Google Scholar]
- [49].Simonyan K and Zisserman A, ”Very Deep Convolutional Networks for Large-Scale Image Recognition,”arXiv preprint arXiv:1409.1556, 2014. [Google Scholar]
- [50].Fei-Fei L, Deng J, and Li K, ”ImageNet: Constructing a large-scale image database,” J. Vis, vol. 9, no. 8, pp. 1037–1037, August. 2010. [Google Scholar]
- [51].Wang L, Pedersen PC, Strong DM, Tulu B, Agu E, and Ignotz R, ”Smartphone-Based Wound Assessment System for Patients With Diabetes,”IEEE Trans. Biomed. Eng, vol. 62, no. 2, pp. 477–488, February. 2015. [DOI] [PubMed] [Google Scholar]
- [52].Wagh A, Jain S, and C. L. and Z. L., Agu Emmanuel, Pedersen P, Strong D, Tulu B, ”Semantic Segmentation of Wound Images: A Systematic Comparison of Convolutional Neural Networks and AHRF Approaches,” unpublished. [Google Scholar]
- [53].Tajbakhsh N et al. , ”Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning?,” IEEE Trans. Med. Imaging, vol. 35, no. 5, pp. 1299–1312, March. 2016. [DOI] [PubMed] [Google Scholar]
- [54].Kawahara J, BenTaieb A, and Hamarneh G, ”Deep features to classify skin lesions,” in Proc. ISBI, Prague, Czech Republic, April. 2016, pp. 1397–1400. [Google Scholar]
- [55].He K, Zhang X, Ren S, and Sun J, ”Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,” in Proc. ICCV, Santiago, Chile, December. 2015, pp. 1026–1034. [Google Scholar]
- [56].Liu C, Wang W, Meng W, Lv F, and Konan M, ”An efficient instance selection algorithm to reconstruct training set for support vector machine,” Knowledge-Based Syst, vol. 116, no. 1, pp. 58–73, January. 2017. [Google Scholar]
- [57].Krizhevsky A, Sutskever I, and Hinton G, ”ImageNet Classification with Deep Convolutional Neural Networks,” in Proc. NIPS, Lake Tahoe, Nevada, USA, December. 2012, pp. 1097–1105. [Google Scholar]
- [58].He K, Zhang X, Ren S, and Sun J, ”Deep residual learning for image recognition,” in Proc. CVPR, Las vegas, NV, USA, June. 2016, pp. 770–778. [Google Scholar]
- [59].Huang G, Liu Z, Van Der Maaten L, and Weinberger KQ, ”Densely connected convolutional networks,” in Proc. CVPR, Hawaii, USA, June. 2017, pp. 4700–4708. [Google Scholar]
- [60].Matsunaga K, Hamada A, Minagawa A, Koga H., ”Image Classification of Melanoma, Nevus and Seborrheic Keratosis by Deep Neural Network Ensemble.” March. 2017, arXiv preprint: arXiv:1703.03108. [Google Scholar]
- [61].Barata C, Celebi ME, and Marques JS, ”Improving Dermoscopy Image Classification Using Color Constancy,” J. Biomed. Inform, vol. 19, no. 3, pp. 1146–1152, May 2015. [DOI] [PubMed] [Google Scholar]
- [62].Menegola A, Tavares J, Fornaciali M et al. ”RECOD Titans at ISIC Challenge 2017.” March. 2017, arXiv preprint: arXiv:1703.04819. [Google Scholar]
















