Abstract
Environmental Enteropathy (EE) and celiac disease (CD) are gastrointestinal conditions that adversely impact the growth of children. EE is prevalent in low- and middle-income countries, whereas as CD is prevalent worldwide. The histologic appearance of duodenal EE biopsies significantly overlaps with celiac enteropathy. We propose a convolutional neural network (ConvNet) to classify EE cases from Pakistani infants along with celiac and healthy controls from the United States. We also identified areas of biopsies that generate high activation values in the ConvNet model. The identified features helped in distinguishing EE and celiac from healthy intestinal tissues. This work advances the understanding of both diseases and provides a potential screening and diagnostic tool for practitioners.
Introduction
A real world problem, faced by clinical gastroenterologists, researchers and GI pathologists studying enteropathies is the major challenge of interpreting clinical biopsy images to diagnose disease when there is the often striking overlap in histopathology between distinct but related conditions. There is a major clinical need to develop new methods in data science to allow clinicians to translate heterogeneous biomedical images and data extracted from patient samples into accurate, quantitative, and precise diagnostics. The development of such processes in high dimensional clinical research data will support the progress of “precision medicine”, with improved diagnostics, treatments, and clinical outcomes for patients. We propose to develop an image analysis platform for the automated extraction of quantitative morphologic phenotypes from gastrointestinal (GI) biopsy images. This method will capture complex GI disease phenotypes, which cannot be measured directly using molecular approaches. Also, the process of getting a report and diagnosis from a biopsy takes up to 10 days. We want to automate the classification of diseased as well as normal tissue so that we can better triage the system of reporting. Pathologists are essential in the diagnosis of these diseases and we believe that our tool can lead to a better utilization and prioritization of their time. Further, the diseases in question, celiac disease and environmental enteropathy, overlap histologically. The proposed model would not only help classify but also lead to identification of novel features that will aid pathologists in making these diagnoses.
Globally, undernutrition is implicated in 45% of the 5 million deaths annually in children under 5 years of age. Linear growth failure (stunting, length-for-age Z [LAZ] score <-2) is a common manifestation of early childhood undernutrition, afflicting ~ 156 million children < 5 years of age worldwide. Stunting serves as a clinical marker for lifelong impairments in physical, immunological (including oral vaccine failure), neurocognitive, and socioeconomic potential. A common cause of stunting in the United States with an estimated 1% prevalence is celiac disease (CD), a gluten-mediated enteropathy. Interestingly, environmental enteropathy (EE) of the small intestine (SI), a condition prevalent in low- and middle-income countries (LMICs) has many shared features with CD which we will focus on as a disease control.
Convolutional neural networks (ConvNets), introduced in1, have received great attention after robust success in classifying images provided by the ImageNet challenge2. Subsequently ConvNets have been used in many areas, mainly image and text analysis, and have been applied to numerous applications (just to name a few,3–8.) In the domain of medical imaging, researchers have found that ConvNets outperform other traditional machine learning models9–16.
The main contributions of this paper include:
A new application of ConvNets to classify small intestinal biopsies from children into three categories: healthy, celiac disease (CD) and environmental enteropathy (EE); and
Identifying micro-level features that would serve as indicators of CD and EE.
Related Works
Convolutional neural networks have been widely used to segment and classify various types of medical images such as histopathological images of breast tissues10, 11, 13, 14, 17, lung images15, skin images9, MR brain images16, or colon tissues13.
In10, the authors used ConvNets to build a pixel-based classifier to detect mitosis in breast histology images. They train and test their method on small patches centered on the pixel of interest. Their method outperformed other classifiers and won the ICPR 2012 mitosis detection competition. This work shows the effectiveness of using neural networks to analyze medical images. Unlike their method, we propose an image-based classifier which will have an additional layer of difficulty since all different patterns throughout the image will be considered in producing the predicted label.
Cruz-Roa et. al. proposed an image-based ConvNet model to detect basal cell carcinoma cancer9. They applied their method to a relatively small data set of 1,417 images of skin histopathology slides. They showed that ConvNets outperform other methods such as bag of features, discrete cosine transform and Haar-based wavelet transform. The analysis by Cruz-Roa et. al. supports our motivation for using ConvNets to classify normal, EE and CD even though our data size is relatively small.
Similar to10, the authors in11 used ConvNets to detect ductal carcinoma in histological slides from biopsies. However, instead of using a pixel-base classifier, the authors built a grid of 100 × 100 patches on top of input slides and train their ConvNets on these patches individually. They trained and tested their method on 113 and 49 slide images respectively. Their approach outperformed other methods that use handcrafted features such as Color Histograms and graph-based features.
In12, Pan et. al. used ConvNets to segment nuclei in pathological images. However, their ConvNet framework did not include fully connected nor softmax layers. They trained their network on random 18 × 18 patches by taking the pixel-wise difference between final convolutional layer and the ground truth. A major difference between12 and other related works is that they use a two-step process to convert their input images to gray scale before feeding it to the ConvNet. Although such preprocessing steps would simplify the learning problem, it would negatively impact the performance, as most color-related information is lost. On the other hand, Xu et. al. used a similar approach but on colored patches to segment epithelial and stromal regions with hematoxylin and eosin and immunohistochemistry images of breast and colon cancer13. Unlike their work, our model uses ground-truth labels and generate aggregated predictions are at the level of the biopsy rather than a smaller segment.
Data Sources and Preprocessing
We integrated and used data from three different sources. The first source includes images scanned from 34 biopsies from patients with celiac disease (12 male and 22 female) and 42 healthy control patient cases both from US children. These cases were manually scanned using our UVA Bio Tissue Repository Core facilities with multiple resolutions. The process produced 1,000 celiac images and 1,008 normal images each of size 1360 × 1024 pixels (example shown in Figure 1). Second, we obtained 10 cases of EE disease from Pakistani children18. For each case, 2 to 4 multiresolution z-level biopsies were taken. Such biopsies were scanned with relatively high resolutions (e.g., ranging from 2288 × 1356 to 18304 × 14926 pixels). Although most discriminating features can only be observed at a high resolution, it is impractical to feed such images as input to the network. Therefore, we developed a segmentation method to automatically divide each z-level image into a number of 1360 × 1024 images. We choose the split size to produce segments that are consistent with celiac and normal images although our classification framework can work with variable sized input. The segmentation process include: 1) Estimation of the background color: we calculated four estimates of background color by taking the average RGB values of 50 × 50 pixels from four corners of each image; 2) Binarization of images: for each pixel in the input image, we checked whether its RGB values were within three standard deviation from the estimated background averages or not. If this condition was satisfied for at least two of the four background estimates, we converted the pixel to white (i.e., background pixel). Otherwise, the pixel was converted to black (i.e., biopsy pixel); 3) Finding non-overlapping boundary and in-biopsy segments: we used a 1360 × 1024 window and counted the number of biopsy pixels. If the number of biopsy pixels in a given window were within 45% to 55% of all pixels, we tagged that window as a boundary segment. On the other hand, biopsy segments were tagged if the number of biopsy pixels was at least 95% of all pixels. Figure 2 shows examples of manually segmented celiac and normal biopsies as well as the automatic segmentation process for EE biopsies. This method generated 809 images from 10 EE cases. The motivation for designing an automatic segmentation process was scalability as we plan to extend the existing work to incorporate EE cohorts from other countries including Bangladesh and Zambia.
Figure 1:
An example of a manually segmented normal biopsy in the laboratory.
Figure 2:
An illustration of automatic segmentation of EE biopsies into 1360 × 1024 non-overlapping images.
Data Augmentation
We used two data augmentation methods to increase the amount of training data and avoid overfitting problems caused by limited data. First, during each training iteration, and for each input image, we applied gamma correction with a random gamma value between 0.5 and 2. Then, we applied standard shifting and rotation methods such that we randomly selected ten 1000 × 1000 patches and their horizontal and vertical reflections2. These methods increased the size of our data by a factor of 30, and helped in learning translation and rotation invariant features (Example shown in Figure 3.) At testing time, we generated 45 patches from each image segment: central patch, 4 corner patches, their reflections, and gamma corrected with rates 0.5, 1 and 2. Then, we outputted the average probabilities of these patches. Finally, since each case consisted of a number of segments, we calculated the average softmax probabilities for all segments and used that to classify testing cases.
Figure 3:
Top 3 activations of a testing celiac image and its horizontal and vertical reflections. This highlights that the use of data augmentation method allowed the network to learn rotation and translation invariant features.
Classification Framework
We proposed a convolutional neural network to perform multi-class classification on biopsies (See Figure 4.) The proposed network consisted of the following components:
Four convolution layers with 16, 16, 32, and 32 feature maps respectively and filter sizes of 5 × 5, 5 × 5, 5 × 5, and 3 × 3 respectively. Note, before each convolution, we zero-padded input layers to ensure that both the input and output of each convolution would have the same size.
Each convolution layer was followed by a rectified linear unit (ReLU) layer19, and then a max pooling layer. The window sizes for the four max pooling layers were set to 2 × 2, 4 × 4, 5 × 5, and 5 × 5 respectively. Note, the stride in both convolution and max pooling layers was set to 1. Given an input of 1,000 × 1,000 × 3 colored input image, the fourth max pooling layer would generate an output with size 5 × 5 for each feature map.
We flattened and concatenated the output of the 32 feature maps and connected them to one fully connected layer with 1,024 neurons.
A dropout layer with dropout probability of 0.520.
A softmax layer that would generate three probability values corresponding to the likelihood of an input image being healthy control, CD or EE.
Figure 4:
Convolutional neural network classification and visualization framework. The network consists of four convolution layers and one fully connected layer. Note, CRMP stands for the output of convolution, ReLU, and max pooling respectively.
We also added two additional components to the network to allow visualization of high activations. The first component is similar in concept to the deconvolutional neural networks21. However, instead of deconvoluting the entire output layer, we only traced back the highest activation for each feature map to the source image. This process allowed us to find segments of input images that activated the feature maps the most. Since the fourth max pooling layer generated a 5 × 5 output, we could trace the maximum value to a segment of 262 × 262 pixels (about one-fourth the size of the input image). However, tracing back high activation to a relatively large segment may not reveal important indicators of EE or celiac. Therefore, we chose to trace activations from the output of the ReLU units at layer 4 which has an output size of 25 × 25. The maximum activation from that layer would be traced back to a segment of 142 × 142 pixels.
While the first component can highlight areas of the biopsies that would produce high activations, it does not account for fully connected and softmax layers. Therefore, high activations might not correspond to the final probability of EE or Celiac. To resolve this, we used Gradient-weighted Class Activation Maps (Grad-CAMs), which use gradients to calculate importance weights22. Figure 5 shows our Grad-CAM framework. To obtain these visualizations, we go through the following process: 1) using given an input image and one of the class labels, we go through a forward-pass to obtain gradients with respect to the class label, 2) gradients are back-propagated until the fourth convolutional layer. These gradients represents the importance of each pixel in each filter with respect to the class label; 3) we apply a weighted sum of all filter values and gradient weights; 4) we apply a ReLU to keep only positive values (i.e., keep only pixels with positive importance with respect to a given class label). This step generates a non-negative 25x25 heatmap representing the importance of pixel values of convoluted 25x25 image; 5) we upscale the heatmap to match the input size (i.e., 1000x1000); and 6) we use the heatmap as a transparency layer for the input image such that each pixel with lower importance will have a high transparency and become less apparent, and vice versa.
Figure 5:
Convolutional neural network classification and visualization framework. The network consists of four convolution layers and one fully connected layer. Note, CRMP stands for the output of convolution, ReLU, and max pooling respectively.
Experimental Results
We performed empirical evaluations of the proposed classification and visualization models. We used a biopsypreserving 10-fold cross validation (CV) experiment setup, i.e., all images from a given biopsy were either in training or testing. As for EE cases, since each patient had more than one biopsy, our CV split preserved the patient ID such that images from all biopsies for a given EE case being either in training or testing. On average, each fold included 76,059 training images and 8,451 testing images. Each model trained for about 20 hours. In the following experiments, we evaluated performance per images and per biopsies.
First, we checked the effect of adjusting gamma on performance. We also tested contrast limited adaptive histogram equalization (CLAHE)23, and both gamma correction and CLAHE. However, using only gamma correction provided the best performance. Table 1 shows the 10-fold CV performance of ConvNet models with four data augmentation approaches: one with shifting and horizontal and vertical reflections, and the others with gamma, CLAHE and both adjustments added to the first one. Most practitioners use only shifting and rotation to augment data. Our results indicate that altering colors has a significant impact on prediction performance, and therefore, there is value in adjusting colors as part of the data augmentation process.
Table 1:
Ten fold cross validation classification performance of ConvNet models.
| Augmentation Method | Per Image Mean | Accuracy Variance | Per Biopsy Mean | Accuracy Variance |
|---|---|---|---|---|
| Shifting+Reflections | 85.13% | 0.0231 | 85.71% | 0.0264 |
| +Gamma | 90.59% | 0.0070 | 93.33% | 0.0096 |
| +CLAHE | 86.79% | 0.0164 | 88.57% | 0.0188 |
| +Gamma^ CLAHE | 86.72% | 0.0152 | 87.61% | 0.0199 |
We also generated confusion metrics to understand where most incorrect classifications occur (See Table 2). Most wrong classifications are between celiac and healthy controls. Overall, our model has a false negative rate of 1.9%.
Table 2:
Aggregated confusion matrices of 10-folds cross validation of ConvNet models.
![]() |
To better understand the type of discriminative features that our model learned, we visualized feature maps activations. For this experiment, we trained only one model on the entire data set. Then, we visualized the top nine image segments from each of the 32 feature maps at layer 4 with respect to the different classes (See Figure 6.) We observed various feature maps learned distinctive patterns, and as a result, the highest 9 activations corresponds to relatively similar segments from training data. Also, Figure 7 shows examples of micro-level features identified by experts to be of high importance to the diagnosis of EE and celiac. Finally, we generated Grad-CAMs (examples shown in Figure 8) visualizations which highlighted the relative importance of each micro-feature with respect to the disease type. The ConvNet model automatically learned such features from data. These observations support our argument for using ConvNets not only for diagnosis assistant but also for understanding the micro-level indicators for the various diseases. Furthermore, the distinctive patterns learned by the model support our motivation for correlating these feature maps with biomarkers. Beside the micro-level features, Grad-CAMs showed how our model learned from data that background pixels, regardless of their location, have little to no contribution to the diagnosis of EE and Celiac.
Figure 6:
Visualization of top 9 activations for healthy, celiac and EE biopsies at layer 4. Note, since high activations can be traced back to boundary segments, its actual size can be less than 142 × 142 pixels.
Figure 7:
Examples of micro-level features identified by the ConvNet. Highlighted sections corresponds to the highest activation value.
Figure 8:
Examples of Grad-CAM and Alpha Grad-CAM visualizations that highlight importance of pixels with respect to a specific class label. Colors ranging from dark red (high activation) to dark blue (low activation) are added as an alpha channel to increase the transparency of less important micro-level features.
Conclusions
In this article, we have proposed an image analysis platform for the automated extraction of quantitative morphologic phenotypes from gastrointestinal (GI) biopsy images using a convolutional neural network. We trained and tested our models on 2,817 images from 105 biopsies. We used data augmentation approaches to increase the number of training instances, and learn translation and rotation invariant features. Furthermore, we also visualized micro-level features by tracing high activation values back to input images.
Future directions of this work include, 1) extending our study to include biopsies from Zambia and Bangladesh; 2) include a wider range of biomarkers such as measures of children’s growth; urinary lactulose: rhamnose measurements and biomarkers of gut absorption and barrier; total, protein, fat, and carbohydrate energy content; microbiome-for-age Z scores and bile acid deconjugation; 3) control for gender in the analysis; and 4) and build a holistic framework to predict celiac and EE using both available biomarkers and biopsies.
Acknowledgments
The authors would like to thank all the members of the Aga Khan University Field Research team without whose hard work and dedication this project could not have been completed: Community Health Workers (lead by Sadaf Jakro), Matiari field site coordinators (Tauseef Akhund, Fayaz Umrani, Sheraz Ahmed), Infectious Disease Research Laboratory (Aneeta Hotwani) and data management team (Najeeb Rahman). The authors would also like to thank Drs. Kamran Sadiq, Najeeha T. Iqbal, and Christopher A. Moskaluk for their invaluable input. We appreciate support from the UVa Bio Tissue Repository Core (Pat Pramoonjago). Finally, an immense thank you to all the family members of the children who participated in this study and the Government of Pakistan for their help in reaching these rural communities. This work was funded by UVa Engineering-in-Medicine seed grant (SSyed, DBrown); Bill & Melinda Gates Foundation (AA:OPP1066203).
References
- 1.Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998;86(11):2278–2324. [Google Scholar]
- 2.Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012:1097–1105. [Google Scholar]
- 3.Dan C Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, Jürgen Schmidhuber. Flexible, high performance convolutional neural networks for image classification.; IJCAI Proceedings-International Joint Conference on Artificial Intelligence; Barcelona, Spain: 2011. p. 1237. [Google Scholar]
- 4.Kathy Lee, Ashequl Qadir, Sadid A Hasan, Vivek Datla, Aaditya Prakash, Joey Liu, Oladimeji Farri. Adverse drug event detection in tweets with semi-supervised convolutional neural networks; Proceedings of the 26th International Conference on World Wide Web; International World Wide Web Conferences Steering Committee; 2017. pp. 705–714. [Google Scholar]
- 5.Ignacio Rocco, Relja Arandjelović, Josef Sivic. Convolutional neural network architecture for geometric matching. arXiv preprint arXiv:1703.05593. 2017 [Google Scholar]
- 6.Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, Ming-Hsuan Yang. European Conference on Computer Vision. Springer; 2016. Single image dehazing via multi-scale convolutional neural networks; pp. 154–169. [Google Scholar]
- 7.Tobias Weyand, Ilya Kostrikov, James Philbin. European Conference on Computer Vision. Springer; 2016. Planet-photo geolocation with convolutional neural networks. pp. 37–55. [Google Scholar]
- 8.Donghyeon Cho, Yu-Wing Tai, Inso Kweon. European Conference on Computer Vision. Springer; 2016. Natural image matting using deep convolutional neural networks; pp. 626–643. [Google Scholar]
- 9.Angel Alfonso Cruz-Roa, John Edison Arevalo Ovalle, Anant Madabhushi, Fabio Augusto Gonza´lez Osorio. International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2013. A deep learning architecture for image representation, visual interpretability and automated basal-cell carcinoma cancer detection; pp. 403–410. [DOI] [PubMed] [Google Scholar]
- 10.Dan C Cireşan, Alessandro Giusti, Luca M Gambardella, Juürgen Schmidhuber. International Conference on Medical Image Computing and Computer-assisted Intervention. Springer; 2013. Mitosis detection in breast cancer histology images with deep neural networks; pp. 411–418. [DOI] [PubMed] [Google Scholar]
- 11.Angel Cruz-Roa, Ajay Basavanhally, Fabio González, Hannah Gilmore, Michael Feldman, Shridar Ganesan, Natalie Shih, John Tomaszewski, Anant Madabhushi. SPIE medical imaging. volume 9041. International Society for Optics and Photonics; 2014. Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. pp. 904103–904103. [Google Scholar]
- 12.Xipeng Pan, Lingqiao Li, Huihua Yang, Zhenbing Liu, Jinxin Yang, Lingling Zhao, Yongxian Fan. Accurate segmentation of nuclei in pathological images via sparse reconstruction and deep convolutional networks. Neurocomputing. 2017;229:88–99. [Google Scholar]
- 13.Jun Xu, Xiaofei Luo, Guanhao Wang, Hannah Gilmore, Anant Madabhushi. A deep convolutional neural network for segmenting and classifying epithelial and stromal regions in histopathological images. Neurocomputing. 2016;191:214–223. doi: 10.1016/j.neucom.2016.01.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nuh Hatipoglu, Gokhan Bilgin. Image Processing Theory, Tools and Applications (IPTA), 2014 4th International Conference on. IEEE; 2014. Classification of histopathological images using convolutional neural network; pp. 1–6. [Google Scholar]
- 15.Qing Li, Weidong Cai, Xiaogang Wang, Yun Zhou, David Dagan Feng, Mei Chen. Medical image classification with convolutional neural network; Control Automation Robotics & Vision (ICARCV), 2014 13th International Conference on; 2014: IEEE; pp. 844–848. [Google Scholar]
- 16.Pim Moeskops, Max A Viergever, Adrie¨nne M Mendrik, Linda S de Vries, Manon JNL Benders, Ivana Išgum. Automatic segmentation of mr brain images with a convolutional neural network. IEEE transactions on medical imaging. 2016;35(5):1252–1261. doi: 10.1109/TMI.2016.2548501. [DOI] [PubMed] [Google Scholar]
- 17.Babak Ehteshami Bejnordi, Mitko Veta, Paul Johannes van Diest, Bram van Ginneken, Nico Karssemeijer, Geert Litjens, Jeroen AWM van der Laak, Meyke Hermsen, Quirine F Manson, Maschenka Balkenhol, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama. 2017;318(22):2199–2210. doi: 10.1001/jama.2017.14585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Sana Syed, Sunil Yeruva, Jeremy Herrmann, Anne Sailer, Kamran Sadiq, Najeeha Iqbal, Furqan Kabir, Shahida M Qureshi, Sean R Moore, Jerrold R Turner, et al. Environmental enteropathy in pakistani children: Clinical profile and histomorphometric analysis. Gastroenterology. 2017;152(5):S437–S438. [Google Scholar]
- 19.Vinod Nair, Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th international conference on machine learning (ICML-10) 2010:807–814. [Google Scholar]
- 20.Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. 2012 [Google Scholar]
- 21.Matthew D Zeiler, Rob Fergus. European conference on computer vision. Springer; 2014. Visualizing and understanding convolutional networks; pp. 818–833. [Google Scholar]
- 22.Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization [Google Scholar]
- 23.Ali M Reza. Realization of the contrast limited adaptive histogram equalization (clahe) for real-time image enhancement. Journal of VLSI signal processing systems for signal, image and video technology. 2004;38(1):35–44. [Google Scholar]









