Summary
The analysis of microcirculation images has the potential to reveal early signs of life-threatening diseases such as sepsis. Quantifying the capillary density and the capillary distribution in microcirculation images can be used as a biological marker to assist critically ill patients. The quantification of these biological markers is labor intensive, time consuming, and subject to interobserver variability. Several computer vision techniques with varying performance can be used to automate the analysis of these microcirculation images in light of the stated challenges. In this paper, we present a survey of over 50 research papers and present the most relevant and promising computer vision algorithms to automate the analysis of microcirculation images. Furthermore, we present a survey of the methods currently used by other researchers to automate the analysis of microcirculation images. This survey is of high clinical relevance because it acts as a guidebook of techniques for other researchers to develop their microcirculation analysis systems and algorithms.
Keywords: image analysis, literature survey, microcirculation analysis
The bigger picture
The analysis of microcirculation images has the potential to reveal early signs of life-threatening diseases. Quantifying the capillary distribution in microcirculation images can be used as a biological marker to assist patients. The quantification of these biological markers is labor-intensive, time-consuming, and subject to interobserver variability. Moreover, manual analysis has been reported to hinder the application of microvascular microscopy in a clinical environment. Several computer vision techniques with varying performances can be used to automate the analysis of these microcirculation images. Computer vision algorithms are faster than convolutional neural networks for capillary detection but have poorer accuracy. Convolutional neural networks are more accurate but slower and require many training data. Therefore, by creating a hybrid model combining both computer vision algorithms and convolutional neural networks, one can strike a balance between accuracy and speed.
The quantification of capillaries in microcirculation images can potentially reveal life-threatening diseases. The quantification of these biological markers is labor intensive, time consuming, and subject to interobserver variability. Several computer vision techniques with varying performances can be used to automate the analysis of these microcirculation images to bring microcirculation analysis closer to clinical practice.
Introduction
Arterioles, venules, and capillaries comprise most of the microvessels known as the microcirculation system.1 These microvessels measure less than 20 μm.1 The cardiovascular system’s main component, microcirculation, is in charge of delivering nutrients, hormones, and oxygen to the tissues.2,3 The microcirculation system is essential for the body’s ability to control blood pressure and sustain tissue function.4,5 In addition, it aids in clearing out waste from the cell tissues.6 Arterioles carry oxygenated blood to the capillaries, while venules carry away deoxygenated blood.1 According to research conducted on microcirculation, a low number of healthy capillaries might indicate some underlying medical conditions.7,8,9,10,11,12,13,14,15,16,17,18,19 In addition, the thin cell walls of the capillaries make it viable for carbon dioxide and oxygen to cross the capillary walls.20 The delivery of oxygen helps the tissue cells’ functional activity by providing the energy they need.21 The parenchymal cells depend on the efficient functioning of the microcirculation to keep the cells alive and sustain their function.21 Thus, it might be argued that the microcirculation system is an essential component of the cardiovascular system. The functional capillary density of the microcirculation system can be used to measure the oxygen transport factor to the cells, which is critical for cell survival;22 thus, methods to measure the capillary density will be the main topic of this paper. Other vital microcirculation tasks include but are not limited to the management of solute exchange in the parenchymal space,1 delivery of white blood cells to their target tissue,23 delivery of all blood-borne hormones, and modulating hemostasis.1
One of the functions of capillaries is to deliver oxygen to the cells of all organs.24 In the past decades, research has been conducted to elucidate whether monitoring microcirculation can be used to diagnose or assess the severity of various diseases.25,26 One of the first study directions was to record blood movement in the nail fold capillaries to assess the severity of rheumatic diseases, such as systemic sclerosis, Raynaud’s syndrome, and dermatomyositis.27 These studies revealed that patients with rheumatic diseases had altered microcirculation, showing dilated and distorted capillary loops and areas with low capillary density compared with controls.27 More recently, studies on the effect of nonrheumatic diseases on microcirculation have been conducted. Microcirculation has been monitored either on the skin surface28 or sublingually2 using video recordings of segments of blood flow within the capillaries. The impact of various diseases on microcirculation has been assessed, including sepsis29,30 and, more recently, COVID.31,32 These studies have reported various correlations between diseases and the density of capillaries, the velocity of blood flow, and the heterogeneity of perfusion.2,27,28,31,32
Recently, the microvascular community has been focused on standardizing the analysis of microcirculation images33 by using automated methods34,35,36 (see state-of-the-art microcirculation image analysis techniques). This is of exceptionally high importance as, currently, the gold standard for microvascular analysis is manual analysis by a trained researcher.34 This process is time-consuming (approximately 2 min per microcirculation image) and prone to subjective bias. Moreover, the length of analysis and requirements for trained researchers prevent microvascular monitoring from being used in routine clinical applications.37 In this article, we present the relevant methods in deep neural networks and traditional computer vision algorithms that can be used to achieve the automation of microcirculation image analysis.
Moreover, cardiogenic shock can deteriorate microcirculatory functions and cause disturbances. Therefore, speeding up the process of microcirculation analysis via accurate automation processes can assist in evaluating the efficiency of techniques used to treat a patient suffering from a cardiogenic shock. Other broader relevant diseases that affect microcirculation include but are not limited to shock reperfusion,38 iatrogenic injury,38,39 and pancreatitis.40
Traditional computer vision techniques require the user to find the optimal set of values to segment the capillaries in an image, while deep neural networks can automatically attempt to find those values based on the dataset provided.41 Although deep neural networks can automate the manual segmentation process, traditional computer vision algorithms require less computational power and are faster than deep neural network methods.42 Thus, we present both techniques in this review paper. That being said, we believe that a combination of both of these methods will be the future of microcirculation analysis in the clinical setting.36
This paper is intended to serve as a guidebook to inform researchers on what deep learning and computer vision techniques exist that can be used for the automated quantification of capillaries. The goal of the methods represented in this paper is to reduce the labor-intensive, time-consuming analysis from several minutes to seconds. Furthermore, these methods aim to reduce the subjectivity of the interobserver variability using a standardized method. The capillary dataset section presents and discusses the capillary dataset. The machine learningsection introduces machine learning and its four types. The deep learning section goes into the details of convolutional neural networks. The deep learning object detection techniques section introduces the two object detection techniques: regional proposal-based framework and unified-based framework. The traditional computer vision object detection techniques section introduces traditional computer vision object detection techniques. The state-of-the-art microcirculation image analysis techniques section presents the current methods and techniques published and used by researchers for capillary quantification. We then conclude our paper.
Capillary dataset
Capillaroscopy is a method that noninvasively checks the dermal papillary capillaries using a microscopy system.43 The image(s) obtained can then be evaluated for capillary density, dimension, and morphology.43 Figure 1 was obtained from Ruaro et al.44 and best illustrates this. Ruaro et al.44 describe systemic sclerosis as a disease that alters the microvascular structure, which can be seen using capillaroscopy. Figure 1A shows the typical pattern with no disease; Figures 1B–1D show how the capillaries react to different disease stages.44
Figure 1.
Systemic sclerosis disease affects the microvascular structure
In (A), we see a normal pattern, while in (B, C, and D), we notice changes in the capillary structure as the disease advances. This image is from Ruaro et al.44
Microcirculation data can prominently come from sublingual microcirculation,45 the eyes,46 or the dorsum region of the hand47 or nail bed.48
Several microscopes are currently available that can achieve such images: Dino-Lite CapillaryScope,49 Optilia Digital Capillaroscope,50 Inspectis Digital Capillaroscope,51 and Smart G-Scope.52 Other equipment that can capture capillaries include dermatoscope, ophthalmoscope, and stereomicroscope; however, none was designed for capillary capture. Therefore, images from this equipment are of relatively lower quality53 than images captured by microscopes. That said, contemporary microscopes designed for capillaroscopy still do not produce adequate image quality compared with those in a standard object recognition dataset. Some examples of standard object recognition datasets used to develop object recognition algorithms are ImageNet (14 million+ images and 21,841 categories)54, Common Objects in Context (COCO) (328,000+ images and 91 categories),54 Places (10 million+ images and 434 categories),55 and Open Images (9 million+ images and 6,000+ categories).56 These datasets have several thousand images per category, and the objects to be identified are relatively not as pixelated as the capillary data. Thus, one of the main challenge is the dataset’s quality. Since the capillaries are relatively smaller in size, measuring less than 20 μm in diameter,24 the shape of the capillary tends to be pixelated. Furthermore, in the literature, there is no clear definition of capillary shapes, which exponentially increases the challenge of not enough data being present for each shape/type of capillary, as shown in Figure 2.
Figure 2.
Example of capillary dataset captured using a relatively high-end microscope
We see that the shapes of capillaries are not the same and can be faded. We also see instances with black spots and other cases with white spots, which is a reflection due to the oil. These are considered artifacts and are undesirable. Such issues make it challenging to develop a highly accurate algorithm that can generalize in detecting capillaries.
Machine learning
Machine learning (ML) is a sub-branch of AI that combines techniques from computer science and statistics.57 ML aims to find patterns from data to make predictions about new data.58 This process of finding a pattern from a set of data is known as training.59 The product of this process is a model that is used to predict new data.59 An ML framework can consist of seven processes,60,61,62 which are data collection, data preparation, selecting and training the model, testing the selected model, evaluating the model using F1-score, and finally deploying the model to production.
ML techniques can be further divided into supervised learning, unsupervised learning, reinforcement learning, and transfer learning.63
Supervised learning is an ML technique where the algorithm learns from data labeled by a human expert.64 In terms of microcirculation data, the model will attempt to highlight the capillaries with a bounding box based on the sample of labeled data provided to it. Supervised learning can be further divided into regression and classification. Regression predicts a value, while classification predicts a class. In the case of microcirculation analysis, we predict whether a region has capillaries or not, so using the classification algorithms from the supervised learning techniques is the most suitable method for capillary detection.
Unsupervised learning, as opposed to supervised learning, is giving a set of data with no labels to the algorithm.65 The algorithm then attempts to find patterns between the data.66 There are two types of unsupervised learning: clustering and association.67 Clustering attempts to find subgroups in the data based on color, density, or other features. In contrast, association tries to discover relationships between patterns. For example, a system that tries to recommend to the user what to purchase next based on what other users have bought, that is, association. Another example is anomaly detection, where the system finds data that exhibits patterns outside the normal range of the data. We did not find any suitable unsupervised algorithms for capillary detection. Moreover, we have not yet found papers on capillary detection that use unsupervised algorithms for capillary detection.
Reinforcement learning finds patterns based on a reward system.68 When the system predicts the right output, the weights of those nodes in the neural network are increased, while an incorrect prediction diminishes the weights of other nodes.69 Reinforcement learning is common in game development for games such as chess and Go.70,71 We did not find any suitable unsupervised algorithms for capillary detection.
Transfer learning uses a model trained on previous data and tweaks the weights of the last n (defined by a machine learning expert) number of layers to train it on new data.72 An alternative way is to add new layers in addition to the existing layers of the network. It is assumed that the earlier layers of a transfer learning model detects the generic features of the image such as the width, height, and edges and that the latest layers learn the fine details of the classification.73 Therefore by freezing the earlier layers while training the later layers on a new set of data, one can adapt any model to different datasets without having to retrain the algorithm from scratch.74
For microcirculation capillary quantification, our review work reveals that the most relevant ML techniques for microcirculation analysis are supervised classification learning and transfer learning.
Deep learning
Deep learning is a sub-branch of ML that has vastly advanced the state of the art in speech recognition and visual object detection.75 In its simplest form, a deep neural network is composed of multiple layers of neurons.75 This typically consists of the input layer, hidden layer(s), and the output layer.75 Deep learning outshines the other ML techniques because it can construct a feature extractor without the need for domain expertise.76 Rather, the neural network increases prediction accuracy by adjusting the neuron weights to optimize an appropriate cost function using back-propagation techniques.77 Deep learning architectures combine multiple non-linear modules to transform input data into a higher abstract level. That suppresses irrelevant information while intensifying the useful parts to increase prediction accuracy.78 In this section, we describe several deep learning architectures and then delve into the details of convolutional neural network (CNN) architectures.
Types of deep neural networks
There are many types of deep neural networks. In this section, we focus on the three most relevant for microcirculation analysis: recurrent neural networks (RNNs),79 generative adversarial networks (GANs),80 and CNNs.81
RNNs are a type of neural network that has a hidden state to find patterns in sequential data and uses the output of previous layers as an input for the current layer.79 The nodes of an RNN are connected by directed cycles, allowing it to keep the state of the previous nodes. While a traditional neural network takes in the input and gives an output, an RNN assumes a relationship exists among the sequences of input data.82 Thus, RNN is mainly used in speech recognition, translation, and sequential data analysis.82
Long short-term memory networks (LSTM)83 and gated recurrent units (GRU)84 are RNN architectures that deal with gradient vanishing challenges.85 The vanishing/exploding gradient problem is when the size of the gradients of each layer does not equal 1. This will prevent the neural weights from converging. The modifications introduced by LSTM and GRU deal with this challenge by reducing the amount of irrelevant information propagating through the architecture by adding appropriate gates into the vanilla RNN.
GAN creates new instances of the data.86 They are made up of a generator (G) and a discriminator (D), whose tasks are opposite to each other. The generator obtains a random noise input z, and its output is a fake datum sample G(z). The discriminator D obtains as input both G(z) and a real sample x, and its output is the probability of whether G(z) is real. G can reliably produce new instances that mimic those in the real dataset. One of its main uses is in data augmentation to create more artificial data resembling the real data as much as possible. Furthermore, they can be used to enhance the image by scaling up the image and increasing its resolution.87
CNNs are the most prominent architecture used for medical image analysis, specifically for microcirculation analysis.77,88,89,90 They were originally developed in 1998 to recognize zip codes91 and digits. Gradually, they have become the most relevant architecture in image classification.
CNNs
In this subsection, we dive into the details of a CNN.
A CNN pipeline
A typical CNN image classification pipeline can consist of three main stages, which are:
-
•
Dataset: where a set of images is labeled with their corresponding classes
-
•
Learning: a model that has learned from every class of the data
-
•
Evaluation: evaluating the performance of the model on a set of data that it has not been previously trained on92
The data at the first step can be further split into a training set and a validation set, so one can evaluate the model before new data come in. In this way, a pipeline can end up with three sub-datasets: a training set, a validation set, and a test set. A good CNN should be invariant to the listed challenges below:93
-
•
Variation of viewpoint: as regards the camera, a single instance of an entity may be directed in various ways
-
•
Variation of scale: the scale of visual groups is often variable
-
•
Deformation: a term used to describe the process of changing many interesting things, which are not solid bodies and may be deformed dramatically
-
•
Occlusion: a term used to describe a situation where objects of interest can be obscured. Just a small portion of an item (a few pixels) can be observable at any given moment
-
•
Illumination: different brightness levels on different parts of the image can significantly affect algorithm performance
-
•
Clutter in the background: the objects of interest can blend in with their surroundings, making them difficult to spot
-
•
Variation within a class: the categories of interest, such as chair, may be very general. These artifacts come in various shapes and sizes, each with their distinct appearance
Data preprocessing can be done in two ways: mean subtraction and normalization.94 The most popular form of preprocessing is mean subtraction.94 It entails subtracting the mean from all the data’s features, with the geometric understanding of centering the cloud of data (pixel values) in all dimensions (1 in case of white and black, 3 in case of RGB). Normalization is the process of bringing the sizes of the scale closer together. This normalization can be accomplished in two forms. The first is to zero-center the value and divide it by its SD. The second is to normalize each dimension such that the minimum and maximum values are between negative 1 and positive 1. In the next section, we talk about the different parts that make up a CNN: the fully connected (FC) layer, the convolutional layer, and the pooling layer.81
The anatomy of a FC layer
In this subsection, we describe the details of a FC layer.94,95,96
Weight initialization: each neural network node consists of parameters known as weights, which take numerical values and contribute to the total of the other weighted input signals. It is recommended to initialize a neural model’s weights with a positive random value to help it converge into the results with the epochs. There are other ways to initiate the weights in a neural network using Xavier weight initialization and He weight initialization. At every iteration, the algorithm tries to find the most appropriate set of weights for each node. For example, in stochastic descent, the neural network weight is gradually adjusted by a cost function.
A cost function is used in supervised learning to measure the difference between the predicted result and the expected value. There are two types of loss functions: one is used for classification when the classes are of a fixed size and set, while regression is used to predict a quantified value.
Regularization: there are four ways to regularize a neural network: L2 regularization, L1 regularization, max norm constraints, and dropout. The most popular form of regularization is L2 regularization. It can be applied by penalizing all parameters’ squared magnitudes. The intuitive understanding of the L2 regularization is that it strongly penalizes high weight vectors while favoring diffuse weight vectors. In L1 regularization, the values become almost invariant to noisy values since they use the most critical inputs and almost all of their irrelevant information is eliminated. In general, if a CNN architecture would be used for feature selection, then L1 regularization handles it best over other regularization methods.
Maxout constraints on the maximum average, which is another method to regularize the values by imposing an absolute upper limit on the magnitude of each neuron’s weight vector, is enforced using projected gradient descent. In fact, this entails updating the parameters as usual and then clamping the weight vector to enforce the constraint. Dropout is an easy, efficient tool for removing a random percentage of neurons that the developer of the neuron can specify.
Activation function: this part in a CNN takes in a number and performs a mathematical operation on it. Many activation functions can be performed in a CNN; however, the common ones are sigmoid, tanh, ReLU, leaky ReLU, and maxout.
Hyper-parameter optimization is finding the right set of values for all of the above properties in a neural network. Mainly, finding the initial learning rate, the decay constant of the learning rate, and the regularization values can strengthen or weaken the neural network. During the forward pass, the score is calculated by applying the operations of all the blocks on the input value. On the backward pass, we compute an updated value of the weights that minimizes the loss function, which increases the overall accuracy of prediction.
The convolutional layer and the pooling layer
In this section, we describe the details of the convolutional and pooling layer.93,94,97
Similar to the anatomy of a node in a FC layer explained in the previous section, a convolutional block is initialized with weights and has a cost function, activation function, and regularization. The difference is that convolutional net architectures presume that inputs are pictures, taking the spatial variance between the inputs into consideration. These processes and parameters decrease the network’s overhead, reducing the number of parameters the program will execute as well as increasing the speed. If we were to pass a full HD image to a neural network (1920 × 1080 × 3), there would be approximately 6.2 million weights to initialize—one weight for each pixel. Instead, we pass that image to a series of convolutional blocks first to reduce the number of weights from 6.2 million to possibly several hundred thousand weights, without compromising accuracy. This reduction of pixels is achieved by transforming the image into a more representative form using a CNN architecture.
A simple CNN architecture consists of four parts:
-
•
input, where data is loaded as a matrix in its raw form (for example, a full HD image will be a matrix of 1920 × 1080 × 3)
-
•
a convolutional layer, which takes in a value and applies the dot product operation followed by some kind of non-linearity (activation function)
-
•
the pooling layer, which applies downsampling to the matrix
-
•
the FC layer, which computes a prediction associated with each class. With each iteration, the values for the convolutional and FC layers change as their weights are adjusted, while the activation function and pooling layer values stay constant throughout the whole process. The details of each part are described below
A convolutional layer consists of a set of filters, forming a matrix that is typically 5 × 5 × 3. However, these values are strictly experimental and depend on the image used. A convolutional layer can have any number of filters specified by the machine learning expert. The filter values are randomly generated and are updated in the back-propagation to reduce the loss value, which increases the probability of correct classification. Each filter slides over the image with a stride value production of a smaller image known as the activation map. The number of strides dictates the number of weights to be initialized later: as the number of strides increases, the number of weights to be initialized at the FC decreases. For example, a stride of two means the filter will jump two pixels in the image before applying the dot product and skipping some pixels. There is a trade-off. Each filter can detect a different image property; for example, a filter can detect horizontal or vertical edges and types of colors. The number of filters in this layer is referred to as the filter depth. Each filter produces an activation map, and these are then stacked on top of each other. Padding is another concept in the convolutional layer that involves adding zeros around the borders of the input image to preserve the sizes of the input and output shapes. Combining the filter depth, stride, and padding, the output volume of the convolutional layer can be calculated. Output volume: (W – K + 2P/S) + 1, where W is the input height/length, K is the filter size, P is the padding, and S is the stride. For example, a single black-and-white image of dimension 200 with a stride of 1, padding of 0, and filter size of 5, with 32 filters, will turn the image from a 200 × 200 × 1 to a 196 × 196 × 32. Each layer represents the activation map produced by the individual filter. This number is very high compared with the input value, and thus a pooling layer is applied to take down the number of parameters to initiate weights for. A pooling layer is typically inserted between consecutive convolutional layers before passing the final value into an FC layer. The pooling layer is applied individually on the activation map; the depth does not change, but the activation map’s width and height are reduced. The pooling layer scales the image and takes in two parameters: the window size and stride. The larger these values are, the smaller the output image will be. There are several types of pooling, but the most common are average and max pooling. Average pooling takes the mean of a window of pixels, while max pooling takes the maximum value within the window. So a pooling layer with a 2 × 2 window and a stride of 1 halves the image’s dimensions from 196 × 196 × 32 to 98 × 98 × 2. Applying several of these between consecutive convolutional layers inevitably reduces the dimensions of the activation map to the most relevant for prediction. The values from the last pooling layer are flattened or converted from 2D to 1D and passed to the FC layer explained earlier. In the next section, we look into how the convolutional layers, activation function and pooling function are used as fundamental building blocks to predict the image classification.
Types of CNNs
LeNet,91 created in 1998, uses a five-level CNN: two convolutional layers with three FC layers. The convoluted layers were made up of a 5 × 5 filter with a stride of 2 with sigmoid function, followed by an average pooling layer of 2 × 2 with a stride of 1. The FC layer contained 120, 84, and 10 neurons, respectively, using softmax as an activation. This CNN’s input data was a grayscale 32 × 32, which is relatively small with today’s standard.
AlexNet,98 released in 2012, outperformed LeNet. It used an eight-layer-deep CNN: five convolutional layers, two hidden layers, and one FC output layer. There are several significant differences between AlexNet and LeNet. AlexNet uses ReLU for the activation function. Moreover, AlexNet uses dropout instead of weight decay for regularization. AlexNet uses more neurons and different filter sizes for each convolutional net. For the FC layer, AlexNet uses 4,096, 4,096, and 1,000 neurons, respectively, compared with the 120, 84, and 10 of LeNet. AlexNet uses 11 × 11, 5 × 5, and 3 × 3 for the convolutional layer, while LeNet uses two 5 × 5 filters.
Visual Geometry Group (VGGNet),99 developed in 2014 at Oxford University, consisted of these basic CNN building blocks: a convolutional layer, activation function ReLU, and a maximum pooling layer. It used 3 × 3 filters with a padding of 1 and a 2 × 2 pooling with a stride of 2. Moreover, the paper authors experimented with several different architectures and concluded that deeper and narrower layers get better results than fewer and wider convolutional layers.
The above three architectures have a common pattern of using the convolutional layer followed by pooling with minimal tweaks. The next three architectures came later and use a slightly different design pattern.
GoogLeNet,100 published in 2015, outperformed the previous three architectures, achieving close to human-level performance with its new inception module. An inception block uses four different blocks in the input images and then concatenates their output. The first three blocks apply a convolutional layer of different window sizes, while the fourth applies max-pooling and then the convolutional layer. The number of blocks in an inception module can be tested on different sets using hyper-parameter tuning. GoogLeNet outperformed the others because it aims to extract the most spatial information possible from each layer. Instead of finding information from the previous block’s output, inception aims to explore each image with different filter sizes. It is like taking the same photograph with different lens magnifications.
Residual neural network (ResNet)101 was published in 2016 and designed to address the increasing complexity of making deeper neural networks. As the number of blocks increased, the accuracy gain per block decreased to where making the network even deeper started to adversely increase complexity and computational power and reduce accuracy. This was achieved using the residual block, which utilizes a skip connection with heavy batch normalization. Like the VGGnet convolutional layer design, a ResNet consists of two consecutive 3 × 3 convolutional layers with an activation function. In addition, there is a connection between this block’s input and the output, known as the skip connection. Like GoogLeNet, which uses four modules within the inception block, ResNet uses four modules within the residual block. It also uses a global average pooling layer before the FC layer.
For microcirculation image analysis, it is not enough to use a CNN architecture; these architectures only detect whether an image has a capillary. They cannot pinpoint the capillaries’ location. To do that, we need to extend the CNN architecture with an object detection architectures.
Deep learning object detection techniques
Object detection techniques aim to estimate the location and label of an object in an image.102 The object detection part extends the CNN architecture. Object detection techniques can generally be split into two distinct categories. The first category, which is a two-step method, aims to first locate the object in the image (object localization) and then estimate the category of the object (object classification). These architectures can be referred to as the region proposal-based framework. The second category is a one-step method, which aims to locate and categorize the objects in one go. These architectures are known as the unified framework.103 Before these methods were developed, the field was dominated by different techniques known as the scale-invariant feature transform (SIFT) technique (from 1999 to 2012).98,104,105 Object detection architectures are benchmarked by measuring the mean average precision (mAP) and efficiency (speed of detection per frame) on standardized datasets.54,55,56,106,107,108 In this section, seven selected architectures from the region proposal-based framework are described along with five selected architectures from the unified framework.
Region proposal-based framework
R-CNN (rich feature hierarchies for accurate object detection and semantic segmentation)109: when this paper was released in 2014, the best-performing object detection architectures had plateaued from 2010 to 2012, with an accuracy of 35% for the most popular datasets.105 This algorithm achieved 20% higher accuracy than its predecessor with the VOC 2012 dataset.109 This method was termed regions with CNN features, or R-CNN. It is also one of the first methods to propose a two-step approach, and many subsequent methods have been based on this approach from 2014 to this day.93 Region proposal-based frameworks are inspired by the combination of deep CNN (DCNN)98 and region proposals.110 R-CNN takes in an image, applies a segmentation mask to it, and extracts the top 2,000 promising bounding boxes on that segmentation. The bounding boxes are of different scales, increasing the probability of identifying different sizes or shapes. It then computes the features of these boxes using a CNN and classifies each region with a linear SVM. On the other hand, this method can be slightly adjusted if the training data is low to apply a supervised pre-training CNN on the ImageNet followed by fine-tuning the low training data. The R-CNN consists of two concurrent pipelines: the region detector and the object detector. For the region detector, R-CNN takes an image and applies a non-deep learning model called selective search to extract approximately 2,000 regions of interest (ROI). These regions present the places in the image where an object is more likely to reside. The proposed region is then warped or cropped to fit a specific dimension before being passed into the object detector. The object detector applies CNN + max-pooling with an activation function to calculate the feature map. The feature map is then passed to an FC layer to create a 4096-dimensional vector. This vector is passed to a classification head that tries to figure out the class, and the regression network that tries to refine the box coordinates. The classification head is optimized using cross-entropy loss, while the regression head is refined using L2 loss. The model is trained by optimizing the model first on the classification loss, then the regression loss. This can take up to several days, with large storage space, since all the features computed from the proposed regions require many gigabytes. This paper methodology consists of three modules.
-
•
Generating the region proposals: selective search method is used to suggest the regions
-
•
Extracting the features using CNN: a 4096-dimensional vector is extracted from each region generated by the previous step using the method applied by the DCNN.111 The features are then computed by subtracting the mean from a 227 × 227 image through a five convolutional layer and a two-FC layer. The output region is warped equally with p = 16; however, the paper suggests that alternative values can be used
-
•
Extracting the class using SVM: each region is scored using an SVM, and a greedy non-maximum suppression for each class is applied independently in the proposed region. If the two regions have an intersection-over-union higher than a threshold, one region will be rejected
If there is a lack of training sets, the paper suggests the addition of these two modules:
-
•
Supervised pre-training: it starts by training the CNN on one of the large datasets as an image classification problem using the Caffe CNN library
-
•
Domain-specific fine-tuning: the wrapped regions created in step two of the above method are used to fine-tune the CNN parameters using stochastic gradient descent.
R-CNN applies the results on the PASCAL VOC 2007 dataset and achieves a 53.7% mAP. These results are a big jump from the previous algorithms proposed for this dataset at that time (year 2010) where the highest achieved mAP was 35.1%. R-CNN is a big step toward building a high-quality object detection architecture after the SIFT era and in the DCNN era. This is noticeable in the jump in accuracy introduced by R-CNN. However, there are some drawbacks to using R-CNN. First, R-CNN has multi-stage, multi-step modules that need to be optimized individually to achieve good results, which increases the chances of introducing inaccuracies and makes training time notably longer. Second, R-CNN uses a FC layer that requires a fixed input shape. Moreover, approximately 2,000 regions are extracted, which, one can argue, is way too much in sparse images and way too little in denser images. Such disadvantages have led to the development of successors such as SPP-Net (spatial pyramid pooling), Fast R-CNN, Faster R-CNN, region-based fully convolutional network (R-FCN), feature pyramid network (FPN), and Mask R-CNN which are presented in the next paragraphs.
SPP-Net (spatial pyramid pooling in deep convolutional networks for visual recognition)112: this method introduces two changes to the existing R-CNN architecture. First, it aims to tackle the challenge of having a fixed-size window since important information can be lost or distorted, reducing accuracy. Second, SPP-Net computes the feature maps for the images instead of repeatedly computing them on each ROI region as the R-CNN. The challenge of having a fixed-size window is tackled by adding a spatial pyramid to the top of the last convolutional layer before the FC layer. Instead of cropping or warping the image, this method aggregates the information by pooling the features and feeding it to the FC layer. The spatial pyramid pooling is an extension of the Bag-of-Words model released in 2006.113 The difference between the R-CNN method and the spatial pyramid pooling methods can be illustrated as follows: the R-CNN method takes in an image, applies crop/warp, and passes it to the convolutional layer and then the FC layer. The SPP-Net method takes an image, passes it directly to the convolutional layer, applies the spatial pyramid pooling, and then passes it to the FC layer. The SPP-Net takes in the feature maps from the last layer of the convolutional layer to create feature maps of fixed-length feature vectors regardless of the input image size. Images with different sizes can be pooled and aggregated into a spatial pyramid, which is then passed to the FC layer. When this paper was released, four existing object detection architectures were compared with their non-SPP counterparts (ZF-5, Convnet∗5, Overfeat-5, Overfeat-7), and the CNN with SPP-Net showed state-of-the-art classification results on Pascal VOC 2007 and ranked at number 2 on the ILSVRC 2014 competition.93 The spatial pyramid pooling method is more efficient than its predecessor since it obtains a significant speedup. The speedup is due to the fact that the CNN layer generates a feature map by running one iteration on the image. Furthermore, it is more accurate since it can learn feature maps from any scale without losing information to cropping or warping. The multi-level pooling makes the input images more robust to deformation. The main drawbacks of SPP-Net are that it is still a multi-stage, multi-step pipeline (feature extraction, network fine-tuning, SVM training, bounding box regressor, feature caching), making it relatively slow. Furthermore, the authors of the paper mention that the accuracy of SPP-Net layers drops when using deeper CNN, since tuning the network will not update the layers before the pyramid layer, leading to reduced accuracy and a very difficult challenge in implementing back-propagation.
Fast R-CNN114: this paper addresses the problems arising from the SPP-Net and R-CNN architectures. Until this paper’s 2015 publication date, object detection methods required generating several hundred regions known as “proposals” to create a feature map; then, the proposals generated estimated the localization of the object. These proposals reduced speed and accuracy while increasing complexity. Similar to R-CNN, this method uses selective search to find the regions and then passes the regions to the object detector. The method also consists of two SVM heads: one for classification to get the class category and the second regression to calculate the bounding box coordinates. The difference is that instead of running the CNN several times on the ROI, it runs it only once by introducing ROI pooling. Second, it streamlines the process on the object detector side, where it jointly classifies and learns the object’s location simultaneously by using multiclass loss. This method generally achieves a higher mAP by having a single stage for the training with a multi-task loss. The increased accuracy is obtained by updating all layers. The speed is achieved by not requiring the features to be cached and because Fast R-CNN learns the softmax classifier and bounding box regression together rather than in two separate processes. These improvements have led to a major decrease in storage space needed. Unlike the R-CNN, Fast R-CNN creates a feature map from the entire image. Furthermore, Fast R-CNN method improves efficiency compared with the SPP-Net: 3× in training and 10× in testing. The authors report that “Fast R-CNN trains 9× faster than R-CNN on the VGG-16 and 213× faster at test-time, with a higher mAP on the PASCAL VOC 2007, 2010 and 2012 dataset … ”114 These speed improvements result from a single process that updates all layers without requiring feature caching. Moreover, to reduce the time spent on the FC layers, Fast R-CNN uses a truncated singular value decomposition (SVD) to accelerate the testing procedures.115 This method has significantly increased the speed and efficiency of object detection, firstly, by streamlining the whole process and, secondly, by applying SVD on the testing set. Thus, Fast R-CNN is more of a speed improvement than an accuracy improvement. On the same dataset that took 84 h to train, Fast R-CNN performed it in 9 h. A major drawback is that Fast R-CNN still relies on external region proposals that make the whole process relatively slow. It uses the selective search method to find the ROI. Furthermore, later research has concluded that convolutional layers are sufficient to localize objects; therefore, adding an FC layer slows down the process unnecessarily.
Faster R-CNN: toward real-time object detection with region proposal networks116: optimizations introduced by SPP-Net and Fast R-CNN have exposed the fact that using external region proposal methods slows down the process. Previous networks mainly relied on selective search110 and Edge box117 to create region proposals. This paper introduces a region proposal network (RPN), which aims to replace the selective search and Edge box. The RPN introduces an almost cost-free proposal computation. For RPN to compete with methods such as selective search, it has to predict the ROI of multiple scales and ratios from an image much faster. Thus, RPN introduces a novel concept of creating anchors on the feature maps. The RPN layer takes in the feature map and generates rectangular object bounds using CNN, which are the new ROI. Faster R-CNN can be trained end-to-end like Fast R-CNN. The RPN output tells the Fast R-CNN where to look. The Faster R-CNN architecture is complex because it has several interconnected parts. The RPN first initializes anchors of different ratios and scales on the feature maps created by the convolutional layer. The paper’s author uses nine types of anchors when the ROI is decided on. The anchors are off three scales and three ratios. These anchors are mapped and fed into the two FC layers, where one layer is responsible for the category classification and the other for the box regression. RPN shares the convolutional feature with the Fast R-CNN, enabling the same efficient computation as mentioned in the methodology of the previous paper. On the VGG-16 model,61 the Faster R-CNN efficiently performs all steps on 5 fps with an accuracy exceeding all recorded results on the VOC 2007 dataset with 73.2% mAP, and on VOC 2012 with 70.4% mAP. Although this method is several times faster than Fast R-CNN, it still relies on applying several hundred ROI per image to detect the region of interest. This leads to computations not being shared after the ROI layer, reducing this method’s overall efficiency.
R-FCN: object detection via region-based fully convolutional networks.118 In Faster R-CNN, each region proposal had to be cropped and resized to be fed into the Fast R-CNN network. The R-FCN attempts to speed up the network by converting this process into fully convolutional. It aims to swap the costly per-region subnetworks with a fully convolutional one, thus allowing the computation to be shared across the whole image. Furthermore, the R-FCN differs from the Faster R-CNN in the ROI pooling layer. The R-FCN proposes a method to use convolutional layers to create an ROI subnetwork. It uses the RPN introduced in the previous method to extract features and pass them on to the R-FCN. The R-FCN then aggregates the output of the last convolutional layer and generates the scores for each ROI. Instead of cropping the regions from the feature map, the R-FCN inputs the feature map into the regression and classification heads, creating an ROI map on the feature map. R-FCN uses ResNet-101 as the backbone of its architecture.101 ResNet-101 has 100 convolutional layers with a 1,000-FC layer. The average pooling layer and the convolutional layers are removed, and the convolutional layer is used to compute the feature maps. A layer applied to the last convolutional block generates the score maps. A sibling convolutional layer is also applied to calculate the bounding box regression. On the PASCAL VOC 2007, it achieves an 83.6% mAP with the 101-layer ResNet. It suggests the same accuracy as the Faster R-CNN but achieves 20 times the speed of its Faster R-CNN counterpart. Thus, R-FCN introduces two advantages over its predecessors: first, CNN is faster than FC layers. Second, the network becomes scale invariant since there is no FC to restrict the input image size.
FPN (feature pyramid networks for object detection)119: this method was designed to address an issue with Faster R-CNN. Faster R-CNN was generally made to address the scale-invariance problem introduced by Fast R-CNN. Faster R-CNN takes an input image and resizes it accordingly, meaning that the network has to run on the image several times with different box sizes, making it slow. The FPN deals with these different scales while maintaining the speed. The FPN is an extension of Faster R-CNN in the same manner that R-FCN is an extension of Faster R-CNN. Having a robust scale invariance is important for object detection since the network should be able to recognize an object at any distance from the camera. Faster R-CNN aimed to tackle this issue by creating anchor boxes. This proved time-consuming since the anchor boxes had to be applied to each ROI. The FPN, however, creates multiple feature maps that aim to represent the image at different scales. Hence, the feature map in RPN is replaced by the FPN, removing the necessity of having multi-scale anchor boxes. The regression and classification are applied across these multiple feature maps. The FPN takes in an input image and outputs multiple feature maps representing smaller height and width but deeper channels known as the bottom-up pathway. The feature maps generated by the FPN goes through a 1 × 1 convolutional layer with a depth of 256. The lateral connection is then applied, which adds the feature elements to the upsampled version of the feature map. Faster R-CNN runs on each scale, and predictions for each scale are generated. FPN comprises two paths: the bottom-up that uses ResNet and the top-down. In the bottom-up approach, CNN is applied to extract features. On the top-down pathway, the FPN constructs a higher resolution layer, but the object locations are no longer accurate because of the down- and upsampling. Therefore, FPN adds a lateral connection between the constructed layers to increase the probability of predicting locations. This method runs at 5 fps, as benchmarked by the previous methodology with a state-of-the-art result on the COCO 2016 dataset. Images have objects with different scales, making it challenging to detect them. When using several anchor boxes to detect objects with different scales, the ratio seems to be memory- and time-consuming. FPN seems to push the accuracy boundaries by introducing a pyramid of feature maps to detect objects of different sizes and scales in an image. It is important to highlight that FPN is a feature detector and not an object detector. Therefore, FPN has to be used with an object detector in its ROI.
Mask R-CNN120: this method extends Faster R-CNN by adding another layer to predict the object mask in parallel with the existing bounding box layer. This is a framework that enables instance segmentation on a state-of-the-art level. The mask branch added to the Faster R-CNN is a small FCN applied to each ROI, which predicts on a pixel-to-pixel basis. In brief, the Faster R-CNN has two stages: the RPN and the Fast R-CNN combined. The Mask R-CNN adopts the same notion as an identical first stage of RPN, and in the second stage, it outputs a mask for each ROI in parallel to the predicting class and box. The branch added to the second layer is an FCN on top of a CNN feature map. The ROI poolings lead to misalignment; therefore, the RoIAlign layer is proposed to preserve the pixel-level alignments. The main method Mask R-CNN introduces is the RoIAlign, which preserves the pixel-spatial correspondences and replaces the quantization from the ROI pooling with bilinear interpolation. The state-of-the-art results are achieved by ResNeXt101-FPN in the COCO dataset. The additional mask branch added introduces minor computational additions. Mask R-CNN is a very promising instance segmentation method that is very flexible and efficient for instance-level segmentation. However, as with the original Faster R-CNN, this architecture struggles with smaller-sized objects, mainly because of the feature maps’ coarseness.
Other image classifications and object detections include but are not limited to NOC, Bayes, MR-CNN and S-CNN, Hyper-Net, ION, MSGR, StuffNet, OHEM, SDP+CRC, SubCNN, GBD-Net, PLANET, NIN, GoogLeNet, VGGNet, ResNet, DenseNet, RetinaNet, ResNet, Corner Net, Inception, Hourglass, Dilated Residual Networks, Xception, VGG, DetNet, Inception, dual path networks (DPN), FishNet, ResNeXt, and GLoRe.93,103,121
For microcirculation analysis, we conclude that deep convolutional neural networks have lifted much of the burden for feature engineering, which was the main focus in the pre-D-CNN era, and changed the focus to designing more accurate and efficient network architecture. Despite the great successes, all methods suffer from the intense labor of creating the bounding boxes. All “newer” methods need exponentially more RAM and GPU in exchange for increased accuracy.
Furthermore, detecting small-size objects and localizing these objects remains a challenge. Using the stated architectures still requires an experienced machine learning engineer to select the appropriate parameters of the algorithms to learn the patterns of the small-sized objects. Several solutions have been suggested by the literature, including multi-task learning (Stuffnet),122 multi-scale representation (IONet),123 and context modeling (HyperNet).124 On the other hand, methods have been proposed to deal with large data imbalances between the objects and the background, such as the online mining algorithms (OHEM).125 For microcirculation analysis, we believe that a region proposal-based framework achieves better microcirculation data accuracy overall.
Unified-based framework
You Only Look Once (YOLO)126: YOLO is a unified-based framework for object detection suggested by Redmon et al.126 The most significant difference between this architecture and the methods in the region proposal-based framework is the ability to track objects in real time. As mentioned earlier, Fast R-CNN proposes 2,000 regions to be predicted, while YOLO takes that down to 100 regions. On a Titan X GPU, YOLO can classify up to 45 frames per second compared with Fast R-CNN at 0.5 frames per second. YOLO takes a 224 × 224 image as an input and divides the image into several grids. It then classifies each object within that grid by giving it two scores: what class it belongs to and confidence percentage. The classification is done by a 24-convolutional layer with a 2-FC layer. According to the tests, YOLO was ineffective at localization, and had low accuracy with comparison to R-CNN. Despite the high speed of YOLO, the low accuracy makes it an unsuitable choice for microcirculation analysis.
YOLOv2127: YOLOv2 addresses the precision issues brought by YOLOv1. It first replaces the CNN classifier with DarkNet19 instead of GoogLeNet. DarkNet19 is a simpler classifier utilizing 19 convolutional layers followed by 5 max-pooling layers, allowing for faster performance on the same dataset. It also removes the FC layer for prediction and uses the anchor boxes method instead, increasing the recall accuracy by 7%. Batch normalization is added between each convolutional layer, increasing the mAP by 2%. Furthermore, it increases the image input from 224 × 224 to 448 × 448, which increases the mAP by an additional 4%. In Faster R-CNN, the size of the anchor boxes is defined beforehand, and YOLOv2 utilizes k-means clustering on the training set to find the right aspect ratio of the anchor boxes to use, increasing its accuracy by a further 5%.
YOLOv3128: this is an improved version over the YOLOv2 that increases overall accuracy with multi-scale labeling of small objects. YOLOv3 uses three separate feature maps to predict the ROIs. It also uses DarkNet53 with independent logistic classifiers, allowing it to detect multi-overlapping objects in the image. With these changes, YOLOv3 is suited to detect smaller objects within the grid, but it performs worse with medium to larger objects.
Single Shot MultiBox Detector (SSD)129: this improved the detection precision of a one-stage detector by implementing multi-reference and multi-resolution detection techniques. SSD detected objects of different sizes and scales across the network instead of just applying detection on the last layer. SSD maintains the speed of YOLO but has higher accuracy on the same standardized sets used to benchmark YOLO. SSD uses VGG16 as its backbone for image classification.
RetinaNet130: this introduces focal loss, which increases the prediction accuracy on small and medium objects compared with the previously mentioned detectors. In an image, the object of interest is relatively smaller than the background image. Therefore, the number of background images creates a class imbalance. The focal loss function aims to increase the weight of the minority class while reducing the weight associated with the majority class. RetinaNet archives comparable accuracies with the region proposal-based framework at the expense of speed.
CornerNet131 challenges the use of anchor boxes by stating that they create the data imbalance issue in the first place. It also states that anchor boxes create unnecessary parameters that have to be tuned, which slows down the training and prediction time. Instead, CornerNet uses key points in a bounding box with a single convolutional neural network. It achieves the highest accuracy when compared with the standard benchmark dataset; however, it is slower than YOLO.
When examining the architectures in the unified framework, we generally notice a trade-off between speed and accuracy. With the above methods, as accuracy increased, the time for detection also increased. In microcirculation analysis, having an accurate method is more important than a fast method. Moreover, the difference in time analysis between the unified framework and the region proposed framework in microcirculation image analysis can boil down a few seconds. Therefore, we recommend the use of a region proposed framework for microcirculation analysis.
Upscaling images using deep neural networks
From our review, the microscope videos have very low resolution. Upscaling the image might help the researcher annotate the data better. The upscaling process involves improving an image’s details by increasing the dimensions and interpolating those extra pixels using a mathematical method. These mathematical methods include an enhanced deep super-resolution network (EDSR),132 an efficient sub-pixel convolutional neural network (ESPCN),133 a fast super-resolution convolutional neural network (FSRCNN),134 and a Laplacian pyramid super-resolution network (LapSRN).135 EDSR employs an architecture similar to ResNet without the batch normalization layer and the ReLU activation layer after the residual block. This architecture can be used to create a scale factor of 2. ESPCN extracts the feature maps and applies the upscaling at the end of the network. Like ESPCN, FSRCNN applies upscaling at the end of the network with a smaller filter size. LapSRN is based on the Laplacian pyramids concept, upscaling gradually through the network.
Traditional computer vision object detection techniques
In this part of the review, we present computer vision object detection techniques that can be used for microcirculation analysis. These presented methods do not use neural networks for classification. Such techniques are also known as feature descriptors; they were gaining momentum from the early 1990s until the rise of deep learning in 2012.103 Although feature descriptors have fallen out of favor compared with deep learning with the benchmark datasets, they are still very relevant for microcirculation analysis. Their computational power and simplicity make these algorithms easier to implement on low-powered or battery-powered devices in hospitals.
Computer vision techniques aim to locate the image of interest from the background by distinguishing between edges, colors, textures, corners, and other image properties. Such traditional computer vision techniques need the values coded beforehand, which were found via trial-and-error methods and domain expertise. Below are three computer vision technique detection methods that can used for microcirculation analysis.
The template matching-based object detection136 methods consist of two steps. The first step is the template generation step, in which a template is generated by an expert based on the training set; the second step involves matching new data with that template-based image. A similar measure is then applied to detect similarities between these images. Statistical methods, such as the sum of absolute differences or Euclidean distances, can quantify the similarities between the template and test data. The template matching detection stage can be further categorized into methods: rigid template matching (RTM) and deformable template matching (DTM). Further modifications for the stated template methods include the SIFT, the speeded-up robust features, and the binary robust independent elementary features.
The main disadvantage of RTM is that it is sensitive to slight changes in viewpoint, shadows, and other challenges, as was stated earlier, while DTM needs a lot of geometrical engineering in the template beforehand. Moreover, these templates require two independent parameters to be tuned, the template to be generated from the training set, and the most suitable method for measuring similarities to be selected. This makes this approach time-consuming for the case of microcirculation analysis.
Another set of methods involves knowledge-based object detection.137 These can be further divided into geometric knowledge and context knowledge. A priori knowledge of the shape is encoded into the geometric knowledge methods. However, this is extremely difficult with capillaries since the shapes are irregular. Context knowledge encodes the spatial relationship between the object and the background around how the neighboring pixels interact. Again, due to the different shades of skin and blood, this method is not preferred for microcirculation analysis.
Object-based image analysis (OBIA)138 is the most promising for microcirculation analysis and comprises two parts: the image segmentation part and the object classification part. OBIA aims to group similar pixels together based on statistical methods. In the case of microcirculation analysis, we would like to highlight the pixels of capillaries and separate them from the background. Promising methods under the OBIA techniques are background subtraction methods, geometric transformation methods, and image thresholding techniques. These methods can be used in combination with each other or independently to detect capillaries in an image.
Background subtraction methods
Background subtraction is a step in image preprocessing, where the goal is to remove the background and keep the object of interest.139 The three methods stated next can take in an image of microcirculation and attempt to calculate an approximation of the location of the capillaries.
-
•
Mixture of Gaussian method: this method uses Gaussian mixture-based background/foreground segmentation.140 It takes in a pixel with a K Gaussian distribution and attempts to model the background. This method is based on using the L-recent window version after the sufficient statistics equation is calculated
-
•
Improved mixture of Gaussian method: this method is also based on a Gaussian mixture-based background/foreground segmentation but uses recursive equations to update the parameters.141 The previous method selects the background based on K Gaussian distribution, while this method uses an adaptive density estimation142
-
•
Statistical background image estimation: this method uses Bayesian segmentation with Kalman filters and Gale-Shapley143 matching to approximate the background image
Marcomini and Cunha144 compare the performances of all three above-mentioned methods using accuracy rate, precision rate, and processing time, and conclude that the improved mixture of Gaussian methods had the best performance in their experimental dataset. This has also been shown to be the best method among the three for background selection in the CapillaryNet paper.36
Image thresholding techniques
In its simplest form, thresholding involves changing a pixel value if it is above or below a certain value.145 This value or threshold can be determined by several methods, and the change of value can also be calculated by different methods. In microcirculation, this can be beneficial for determining the set of values that represent the capillary and the other that represents the background. Listed below are five thresholding techniques that can be used for microcirculation analysis.
-
•
Binary threshold: this method takes in two values—the threshold value and the value to be given if the value is higher than the threshold value. The values under the threshold value will be set to zero
-
•
Truncating threshold: similar to the above method, it takes in two values. However, anything lower than the threshold value remains the same, while anything higher gets the assigned value
-
•
Zero threshold: anything lower than the threshold value becomes zero, while anything higher stays the same
The above methods are fairly simple, and these values are determined by the user. However, they are not the optimal method if different parts of the same image have different illumination. The object of interest may have a higher or lower value depending on the light; therefore, the next two methods were designed to deal with this issue.
-
•
Adaptive thresholding146: thresholding is applied locally on some pixels rather than globally on the whole image. Thresholding can be calculated in two ways: mean of an area or weighted sum where the weights are decided by a Gaussian window. The size of the window is decided by the user. This way, every window-sized part of the image gets a threshold applied to it based on a calculation
-
•
OTSU binarization147: this method is optimal for images with two peaks in their histogram. It finds a value between the two peaks in a histogram where the variance is minimal for both classes and applies thresholding based on that.
Edges and lines
There are several methods for detecting edges and outlines of the capillaries. Below, we list those most relevant to microcirculation detection.
-
•
Contours148: this involves drawing a line joining the pixels with the same color or intensity. In our case, this can help highlight the outline of a capillary. This method has the highest accuracy when thresholding is applied beforehand, so more pixels have similar values. In the below method, we use a marching square algorithm, which linearly interpolates the pixel values to find the algorithm output148
-
•
The Canny edge detector149 can detect and quantize the capillary area. This is a multi-stage detector that uses a Gaussian derivative to compute the gradients. The Gaussian attempts to reduce the noise in the image, and the curves are detected by selecting the maximum pixel value
-
•
Skeletonization150 is a method used to find the central pixels within the border image to get the object topology. This method iterates over the image several times, starting from the border of the object and moving toward the center until it terminates
Histogram equalization
An image can be enhanced using histogram equalization methods. Histogram equalization151 can be done using three methods: standard equalization, contrast stretching, or adaptive equalization.
-
•
In a standard equalization, the most frequent value is spread out to roughly have a linear cumulative distribution graph
-
•
In contrast stretching, the image pixels are rescaled to include all values between the 2nd and 98th percentile
-
•
With adaptive equalization, changes in pixels occur locally based on a window size rather than the whole image
Image denoising
Image can be enhanced by reducing the noise. This is called image denoising.152 There are several ways to denoise an image; total variation filters, bilateral, wavelet denoising filters, and non-local means denoising algorithm.
-
•
The total variation filter uses the L1 norm gradient to remove noise from the image
-
•
The bilateral filter averages the pixels based on the weighted average of the window used by the user
-
•
The wavelet denoising filter represents the image as a wave and analyzes the wavelet coefficients. The wavelet coefficients that are under a certain threshold are represented as noise and are removed
-
•
A non-local means denoising algorithm estimates the value of a pixel based on similar patches from other similar areas in the image. This method can be applied either by spatial Gaussian weighting or uniform spatial weighting
State-of-the-art microcirculation image analysis techniques
In this section, we present the methods used by other researchers to develop their microcirculation analysis systems. The following methods utilize computer vision techniques to segment and, in some cases, quantify the capillaries. A summary table is shown in Table 1.
Table 1.
Summary of work in literature
| Name of Technique | Type of technique | Outcome | Summary of method |
|---|---|---|---|
| Dobbe et al.153 | traditional computer vision techniques | no accuracy reported | before using a traditional computer vision algorithm to locate capillaries, Dobbe et al. utilize a frame-averaging approach to eliminate the plasma and white blood cell gaps within the capillary |
| Hilty and co-workers34,35 | traditional computer vision techniques | no accuracy reported | Hilty and co-workers identify capillaries by producing a mean picture over all frames and then sending the resulting image through two pipelines: the first categorizing vessels with diameters of 20–30 m as capillaries, and the second classifying vessels with diameters of up to 400 m as venules. The capillaries are then equalized using an adaptive histogram after being run through a modified curvature-based area recognition technique154 |
| Bezemer et al.155 | traditional computer vision techniques | no accuracy reported | Bezemer et al. fill the blood flow gaps caused by plasma and white blood cells with 2D cross-correlation. This is a superior strategy since it reduces the number of frames that must be ignored |
| Tam et al.156 | traditional computer vision techniques | no accuracy reported | Tam et al. detect capillaries through a semi-automated method that requires the user to select points on the image |
| Geyman et al.157 | traditional computer vision techniques | no accuracy reported | a manual approach using a software to remove the major blood vessels and then using pre calculations to detect the total number of capillaries using the pixels in the region of interest |
| Demir et al.158 | traditional computer vision techniques | no accuracy reported | uses contrast limited adaptive histogram equalization combined with a median filter |
| Cheng et al.159 | traditional computer vision techniques | no accuracy reported | generates a macro by combining different types of traditional computer vision techniques used to detect capillaries |
| Dai et al.160 | deep neural networks | accuracy of 60.94% reported | uses a shallow convolutional neural network |
| Hariyani et al.161 | deep neural networks | accuracy of 64% reported | uses a U-net architecture combined with a dual attention module162,163 |
| Prentašic et al.164 | deep neural networks | accuracy of 83% reported | uses a shallow convolutional neural network |
| Nivedha et al.165 | deep neural networks | accuracy of 83.3% reported | uses a non-linear support vector machine |
| Javia et al.166 | deep neural networks | accuracy of 89.45% reported | uses a ResNet architecture |
| CapillaryNet36 | mixture of traditional computer vision techniques and deep neural networks | accuracy of 93% reported | uses a combination of traditional computer vision techniques, including image background subtraction, image enchantment, and shallow convolutional neural networks |
Dobbe et al.153 used a frame averaging method to remove the cell gaps within the capillary and applied an algorithm to detect capillaries. They also removed capillaries that were out of focus since they considered them to add noise to the frame averaging method. However, this can significantly reduce the capillary density values.
The study of Hilty and co-workers34,35 is similar to that of Dobbe et al.,153 but with some minor changes. Hilty et al.34 use an algorithm to detect capillaries that are 20–30 μm wide. This type of detection can sometimes lead to the detection of artifacts, such as hair or stains of similar sizes. Furthermore, the mean of the images across the whole video is not always the best representation value since different parts of the video might have different lighting or capillaries that can be out of optimal focus. Moreover, videos with slight motion will have to be completely disregarded since the central line is calculated across all frames instead of per frame.
Bezemer et al.155 used 2D cross-correlation to fill in gaps in the images. However, this method also has some problems because it does not consider the dynamic changes in blood flow, which can reduce the prediction accuracy.
Tam et al.156 have a method to detect capillaries that requires the user to select points on the image. The algorithm then decides if there is a capillary present. This method cannot be used in a clinical environment because it would take too long to analyze a microscopy video.
Geyman et al.157 take a more manual approach to find the number of capillaries. They used software to click away the major blood vessels and then applied hardcoded calculations to detect the total number of capillaries based on the number of pixels in the region of interest. This is a manual method that is particularly subject to observer differences among datasets.
Demir et al.158 used a method called contrast limited adaptive histogram equalization (CLAHE) to detect capillaries. This method makes it easier to see capillaries by equalizing the contrast in an image. CLAHE is usually combined with a median filter, which is a tool that removes outliers from data. They also used an adjustable threshold to make the detection more accurate. However, this method is not perfect and sometimes requires manual adjustments depending on the lighting and skin thickness.
Cheng et al.159 created a system that allows users to increase the contrast and smoothen images of capillaries manually, in order to make them easier to see. The system also generates macros, which are sets of instructions that can be applied to future images to save time. However, the macros generated by the system may not work well if the brightness or thickness of the skin in the images changes.
To find and quantify the capillaries in an image, the Tama et al.167 study uses binarization, skeleton extraction, and segmentation. First, binarization is applied to the green channel since it is assumed to have the highest contrast between the capillaries and the background. Next, the top-hat transform method is used to reduce uneven illumination, followed by Wiener filtering to remove noisy pixels. Then, the Gaussian smoothing method is used to smoothen the image. Finally, the OTSU thresholding method is applied to segment the capillaries from the background. This method relies on the user finding a reference frame from the video with the highest contrast.
The work described next use ML techniques to segment and, in some cases, quantify the capillaries.
Prentašic et al.164 trained a neural network to segment the microvasculature structure. The segmentation took approximately 2 min per single image, with an accuracy of 83%. The time taken and the high-end hardware used to analyze a single image make it unsuitable for clinical use since the users would like the results instantly.
Dai et al.160 employed a custom neural network for segmentation, comparable with Prentašic et al.164 Dai et al.,160 on the other hand, utilized five CNN blocks instead of three. For picture improvement, they employed gamma correction and CLAHE. They reported an accuracy of 60.94%, which is insufficient for application.
Nivedha et al.165 classified the capillaries using the image’s green channel and a support vector machine. This approach required a manual step in which the user cropped the region of interest to enhance histogram equalization. They compared several denoising filters, including Gaussian, Wiener, median, and adaptive median. They concluded that the Gaussian filter is best suited to their data. Furthermore, they examined other segmentation methods, such as OTSU, k-means, and the watershed, and determined that the OTSU approach was best suited to their data. The segmented pictures were then fed into an SVM, which produced an accuracy of 83.33%.
Javia et al.166 modify the ResNet18101 to quantify capillaries and use the first 10 layers of the architecture. The main limitation of the ResNet architecture is that images have to be resized to 224 × 224; however, most capillary images are less than 100 × 100. This means images have to be scaled up, which makes this method inefficient and uses more resources than needed. They reported an accuracy of 89.45% on their data; however, ResNet18101 has 11 million trainable parameters and, with such scaling up, training time can be up to several hours, and prediction time can be up to several minutes. This can make it slow and inefficient within a clinical setting. The training and test times were not reported in this paper.
To construct their neural network, Ye et al.168 used transfer learning and the Single Shot Detector v.2. Because it is accurate and not overly sophisticated, this network is ideal for finding capillaries. The authors also calculated flow velocity using a spatiotemporal diagram analysis. This procedure is time-consuming, but it is precise.
Hariyani et al.161 used a U-net architecture combined with a dual attention module to try to improve accuracy for detecting capillaries in images. However, the accuracy was only 64%, which is not high enough to be used in a clinical setting.
Semiautomatic analysis is required for more accurate approaches, but more automatic methods are less precise and hence inappropriate for clinical use. Furthermore, none of the previously supercited publications used parallel frameworks to determine capillary density. CapillaryNet is totally automated and can categorize microcirculation movies in 0.9 s with 93% accuracy,36 whereas CapillaryX offers parallel frameworks for calculating capillary density.169
A summary of the most recommended techniques is shown in Table 2.
Table 2.
Name and type of technique recommended for each microcirculation data parameter
| Name of technique | Type of technique | Goal in microcirculation images |
|---|---|---|
| Unified-based framework | deep neural networks | capillary detection and quantification |
| Enhanced deep super-resolution network and Laplacian pyramid super-resolution network | deep neural networks | image upscaling |
| Template matching-based object detection | traditional computer vision techniques | capillary detection and quantification |
| Knowledge-based object detection | traditional computer vision techniques | capillary detection |
| Background subtraction methods | traditional computer vision techniques | supports in capillary quantification |
| Image thresholding techniques | traditional computer vision techniques | supports in capillary quantification |
| Edges and lines | traditional computer vision techniques | supports in capillary quantification |
| Histogram equalization | traditional computer vision techniques | image enhancement |
| Image denoising | traditional computer vision techniques | remove noise from pixelated capillary images |
| Image denoising | traditional computer vision techniques | remove noise from pixelated capillary images |
Conclusions
In this paper, we present the most promising deep learning and computer vision techniques that can automate microcirculation analysis, specifically the quantification of capillaries. Automating the quantification of capillary density might reveal important biomarkers to clinical personnel that might assist in helping critically ill patients with life-threatening diseases. With the automation algorithms, the analysis time can be reduced from minutes to several seconds and decrease interobserver variability. We start by introducing the importance of analyzing microcirculation videos. We then present the two prominent ways of automating the analysis of microcirculation videos: traditional computer vision techniques and deep learning techniques. We discuss the types of deep neural networks, then dive into details about the convolutional neural networks. Convolutional neural networks are the preferred method for analyzing images since they have the highest accuracy in image classification competitions. We present why convolutional neural networks are good at what they do and what challenges they can overcome. We then present the anatomy of a convolutional neural network by discussing the FC layer, the convolutional layer, and the pooling layer. Moreover, we present different types of convolutional neural networks that combine these three modules differently. Since convolutional neural networks can only classify images and cannot localize the regions of the capillaries, we present deep learning object detection techniques. The deep learning object detection techniques consist of two main frameworks: the unified-based framework and the regional proposed framework. We present seven different algorithms on the regional proposed framework and six different algorithms on the unified-based framework. We then discuss traditional computer vision object detection techniques, specifically, non-ML-based object detection methods, such as background subtraction methods, image thresholding techniques, edges and lines, and image enhancement techniques. Through the sections in this article, we have recommended the algorithms that can be used to develop an automated capillary detector and quantifier. Our contribution with this article is to assist researchers and developers with where to start looking if they are to develop an automated algorithm for capillary detection and quantification.
To finalize, combining deep neural networks with traditional computer vision algorithms is the recommended approach to automating capillary detection and quantification. The traditional computer vision step is used for segmentation and area estimation, while the deep neural network will be used to classify if a capillary exists within that area. Using purely deep neural networks for the whole phase can be slow (due to the millions of parameters needed for a deep neural network) and computationally and financially expensive (due to the GPU and advanced computers needed). Using pure traditional computer vision algorithms will reduce the overall accuracy of detecting capillaries since artifacts (i.e., dirt, hair, and other objects that are on the surface of the skin) can be mistakenly quantified.
Acknowledgments
We would like to thank ODI Medical AS, a MedTech company specializing in the analysis of microcirculation systems, for the time they allowed us to speak with their medical doctors and surgeons who spent several years in the microcirculation analysis field. Specifically, we would like to thank Prof. Dr. Knut Kvernebo, a distinguished cardiac surgeon, and Anastasiya Dykyy, a biologist and medical equipment expert.
The Research Council of Norway provided the necessary funds for this project: Industrial Ph.D. project no. 305716 and ODI Medical AS.
Declaration of interests
Maged Helmy is 50% funded by ODI Medical AS and 50% funded by the Research Council of Norway.
Biography
About the author
Maged Helmy is a software engineer with 7+ years of experience in designing, developing, and shipping machine-learning enriched software. He is cloud certified by Microsoft and machine learning certified by Google. He completed his master’s degree in systems engineering with a focus on machine learning and is a PhD candidate in software engineering with a focus on deep learning at the University of Oslo.
Contributor Information
Maged Helmy, Email: magedaa@uio.no.
Eric Jul, Email: ericbj@uio.no.
References
- 1.Guven G., Hilty M.P., Ince C. Microcirculation: physiology, pathophysiology, and clinical application. Blood Purif. 2020;49:143–150. doi: 10.1159/000503775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.De Backer D., Hollenberg S., Boerma C., Goedhart P., Büchele G., Ospina-Tascon G., Dobbe I., Ince C. How to evaluate the microcirculation: report of a round table conference. Crit. Care. 2007;11:R101. doi: 10.1186/cc6118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Shore A.C. Capillaroscopy and the measurement of capillary pressure. Br. J. Clin. Pharmacol. 2000;50:501–513. doi: 10.1046/j.1365-2125.2000.00278.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bateman R.M., Sharpe M.D., Ellis C.G. Bench-to-bedside review: microvascular dysfunction in sepsis–hemodynamics, oxygen transport, and nitric oxide. Crit. Care. 2003;7:359–373. doi: 10.1186/cc2353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Zafrani L., Ince C. Microcirculation in acute and chronic kidney diseases. Am. J. Kidney Dis. 2015;66:1083–1094. doi: 10.1053/j.ajkd.2015.06.019. [DOI] [PubMed] [Google Scholar]
- 6.Ovadia-Blechman Z., Gritzman A., Shuvi M., Gavish B., Aharonson V., Rabin N. The response of peripheral microcirculation to gravity-induced changes. Clin. Biomech. 2018;57:19–25. doi: 10.1016/j.clinbiomech.2018.06.005. [DOI] [PubMed] [Google Scholar]
- 7.Parker M.J.S., McGill N.W. In: Connective Tissue Disease - Current State of the Art [Working Title] Takeda A., editor. IntechOpen; 2019. The established and evolving role of nailfold capillaroscopy in Connective- Tissue disease; pp. 1–13. [Google Scholar]
- 8.Nama V., Onwude J., Manyonda I.T., Antonios T.F. Is capillary rarefaction an independent risk marker for cardiovascular disease in south asians? J. Hum. Hypertens. 2011;25:465–466. doi: 10.1038/jhh.2011.1. [DOI] [PubMed] [Google Scholar]
- 9.Houben A.J., Martens R.J., Stehouwer C.D. Assessing microvascular function in humans from a chronic disease perspective. J. Am. Soc. Nephrol. 2017;28:3461–3472. doi: 10.1681/ASN.2017020157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.de Graaff J.C., Ubbink D.T., van der Spruit J.A., Lagarde S.M., Jacobs M.J. Influence of peripheral arterial disease on capillary pressure in the foot. J. Vasc. Surg. 2003;38:1067–1074. doi: 10.1016/s0741-5214(03)00603-7. [DOI] [PubMed] [Google Scholar]
- 11.Fagrell B., Intaglietta M. Microcirculation: its significance in clinical and molecular medicine, ” en. J. Intern. Med. 1997;241:349–362. doi: 10.1046/j.1365-2796.1997.125148000.x. [DOI] [PubMed] [Google Scholar]
- 12.Houtman P.M., Kallenberg C.G., Wouda A.A., The T.H. Decreased nailfold capillary density in raynaud’s phenomenon: a reflection of immunologically mediated local and systemic vascular disease? Ann. Rheum. Dis. 1985;44:603–609. doi: 10.1136/ard.44.9.603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Schmeling H., Stephens S., Goia C., Manlhiot C., Schneider R., Luthra S., Stringer E., Feldman B.M. Nailfold capillary density is importantly associated over time with muscle and skin disease activity in juvenile dermatomyositis. Rheumatology. 2011;50:885–893. doi: 10.1093/rheumatology/keq407. [DOI] [PubMed] [Google Scholar]
- 14.Duscha B.D., Kraus W.E., Keteyian S.J., Sullivan M.J., Green H.J., Schachat F.H., Pippen A.M., Brawner C.A., Blank J.M., Annex B.H. Capillary density of skeletal muscle: a contributing mechanism for exercise intolerance in class II-III chronic heart failure independent of other peripheral alterations. J. Am. Coll. Cardiol. 1999;33:1956–1963. doi: 10.1016/s0735-1097(99)00101-1. [DOI] [PubMed] [Google Scholar]
- 15.Robbins J.L., Jones W.S., Duscha B.D., Allen J.D., Kraus W.E., Regensteiner J.G., Hiatt W.R., Annex B.H. Relationship between leg muscle capillary density and peak hyperemic blood flow with endurance capacity in peripheral artery disease. J. Appl. Physiol. 2011;111:81–86. doi: 10.1152/japplphysiol.00141.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Moeini M., Lu X., Avti P.K., Damseh R., Bélanger S., Picard F., Boas D., Kakkar A., Lesage F. Compromised microvascular oxygen delivery increases brain tissue vulnerability with age. Sci. Rep. 2018;8:8219. doi: 10.1038/s41598-018-26543-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.López A., Grignola J.C., Angulo M., Alvez I., Nin N., Lacuesta G., Baz M., Cardinal P., Prestes I., Bouchacourt J.P., et al. Effects of early hemodynamic resuscitation on left ventricular performance and microcirculatory function during endotoxic shock. Intensive Care Med Exp. 2015;3:49. doi: 10.1186/s40635-015-0049-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.De Backer D., Creteur J., Preiser J.-C., Dubois M.-J., Vincent J.-L. Microvascular blood flow is altered in patients with sepsis. Am. J. Respir. Crit. Care Med. 2002;166:98–104. doi: 10.1164/rccm.200109-016oc. [DOI] [PubMed] [Google Scholar]
- 19.Wester T., Awan Z.A., Kvernebo T.S., Salerud G., Kvernebo K. Skin microvascular morphology and hemodynamics during treatment with veno-arterial extra-corporeal membrane oxygenation. Clin. Hemorheol. Microcirc. 2014;56:119–131. doi: 10.3233/CH-131670. [DOI] [PubMed] [Google Scholar]
- 20.Ellis C.G., Jagger J., Sharpe M. The microcirculation as a functional system. Crit. Care. 2005;9:S3–S8. doi: 10.1186/cc3751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Pittman R.N. Oxygen transport and exchange in the microcirculation. Microcirculation. 2005;12:59–70. doi: 10.1080/10739680590895064. [DOI] [PubMed] [Google Scholar]
- 22.den Uil C.A., Klijn E., Lagrand W.K., Brugts J.J., Ince C., Spronk P.E., Simoons M.L. The microcirculation in health and critical disease. Prog. Cardiovasc. Dis. 2008;51:161–170. doi: 10.1016/j.pcad.2008.07.002. [DOI] [PubMed] [Google Scholar]
- 23.Popel A.S., Johnson P.C. Microcirculation and hemorheology. Annu. Rev. Fluid Mech. 2005;37:43–69. doi: 10.1146/annurev.fluid.37.042604.133933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.De Backer D., Hollenberg S., Boerma C., Goedhart P., Büchele G., Ospina-Tascon G., Dobbe I., Ince C. How to evaluate the microcirculation: report of a round table conference. Crit. Care. 2007;11:R101. doi: 10.1186/cc6118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cassoobhoy A. What is the definition of capillaries? 2020. https://www.webmd.com/heart-disease/heart-failure/qa/what-is-the-definition-of-capillaries visited on 10/27/2020.
- 26.Shore A.C. Capillaroscopy and the measurement of capillary pressure. Br. J. Clin. Pharmacol. 2000;50:501–513. doi: 10.1046/j.1365-2125.2000.00278.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Maricq H.R., Spencer-Green G., LeRoy E.C. Skin capillary abnormalities as indicators of organ involvement in scleroderma (systemic sclerosis), raynaud’s syndrome and dermatomyositis. Am. J. Med. 1976;61:862–870. doi: 10.1016/0002-9343(76)90410-1. [DOI] [PubMed] [Google Scholar]
- 28.Wester T., Awan Z.A., Kvernebo T.S., Salerud G., Kvernebo K. Skin microvascular morphology and hemodynamics during treatment with venoarterial extra-corporeal membrane oxygenation. Clin. Hemorheol. Microcirc. 2014;56:119–131. doi: 10.3233/CH-131670. [DOI] [PubMed] [Google Scholar]
- 29.De Backer D., Creteur J., Preiser J.-C., Dubois M.-J., Vincent J.-L. Microvascular blood flow is altered in patients with sepsis. Am. J. Respir. Crit. Care Med. 2002;166:98–104. doi: 10.1164/rccm.200109-016oc. [DOI] [PubMed] [Google Scholar]
- 30.Top A.P.C., Ince C., de Meij N., van Dijk M., Tibboel D. Persistent low microcirculatory vessel density in nonsurvivors of sepsis in pediatric intensive care. Crit. Care Med. 2011;39:8–13. doi: 10.1097/CCM.0b013e3181fb7994. [DOI] [PubMed] [Google Scholar]
- 31.Natalello G., De Luca G., Gigante L., Campochiaro C., De Lorenzis E., Verardi L., Paglionico A., Petricca L., Martone A.M., Calvisi S., et al. Nailfold capillaroscopy findings in patients with coronavirus disease 2019: broadening the spectrum of covid-19 microvascular involvement. Microvasc. Res. 2021;133:104071. doi: 10.1016/j.mvr.2020.104071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Kanoore Edul V.S., Caminos Eguillor J.F., Ferrara G., Estenssoro E., Siles D.S.P., Cesio C.E., Dubin A. Microcirculation alterations in severe covid-19 pneumonia. J. Crit. Care. 2021;61:73–75. doi: 10.1016/j.jcrc.2020.10.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Ince C., Boerma E.C., Cecconi M., De Backer D., Shapiro N.I., Duranteau J., Pinsky M.R., Artigas A., Teboul J.-L., Reiss I.K.M., et al. Second consensus on the assessment of sublingual microcirculation in critically ill patients: results from a task force of the european society of intensive care medicine. Intensive Care Med. 2018;44:281–299. doi: 10.1007/s00134-018-5070-7. [DOI] [PubMed] [Google Scholar]
- 34.Hilty M.P., Guerci P., Ince Y., Toraman F., Ince C. Microtools enables automated quantification of capillary density and red blood cell velocity in handheld vital microscopy. Commun. Biol. 2019;2:1–15. doi: 10.1038/s42003-019-0473-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Hilty M.P., Akin S., Boerma C., Donati A., Erdem Ö., Giaccaglia P., Guerci P., Milstein D.M., Montomoli J., Toraman F., et al. Automated algorithm analysis of sublingual microcirculation in an international multicentral database identifies alterations associated with disease and mechanism of resuscitation. Crit. Care Med. 2020;48:e864–e875. doi: 10.1097/CCM.0000000000004491. [DOI] [PubMed] [Google Scholar]
- 36.Helmy Abdou M.A., Truong T.T., Dykky A., Ferreira P., Jul E. Capillarynet: an automated system to quantify skin capillary density and red blood cell velocity from handheld vital microscopy. Artif. Intell. Med. 2022;127:102287. doi: 10.1016/j.artmed.2022.102287. [DOI] [PubMed] [Google Scholar]
- 37.Martini R. The compelling arguments for the need of microvascular investigation in covid-19 critical patients. Clin. Hemorheol. Microcirc. 2020;75:27–34. doi: 10.3233/CH-200895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Ocak I., Kara A., Ince C. Monitoring microcirculation. Best Pract. Res. Clin. Anaesthesiol. 2016;30:407–418. doi: 10.1016/j.bpa.2016.10.008. [DOI] [PubMed] [Google Scholar]
- 39.Ashruf J.F., Bruining H.A., Ince C. New insights into the pathophysiology of cardiogenic shock: the role of the microcirculation. Curr. Opin. Crit. Care. 2013;19:381–386. doi: 10.1097/MCC.0b013e328364d7c8. [DOI] [PubMed] [Google Scholar]
- 40.Cuthbertson C.M., Christophi C. Disturbances of the microcirculation in acute pancreatitis. Br. J. Surg. 2006;93:518–530. doi: 10.1002/bjs.5316. [DOI] [PubMed] [Google Scholar]
- 41.O’Mahony N., Campbell S., Carvalho A., Harapanahalli S., Hernandez G.V., Krpalkova L., et al. In: Science and Information Conference. Arai K., editor. Springer; 2019. Deep learning vs. traditional computer vision; pp. 128–144. [Google Scholar]
- 42.Voulodimos A., Doulamis N., Doulamis A., Protopapadakis E. Deep Learning for Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018;2018:7068349. doi: 10.1155/2018/7068349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Cutolo M., Pizzorni C., Secchi M.E., Sulli A. Capillaroscopy. Best Pract. Res. Clin. Rheumatol. 2008;22:1093–1108. doi: 10.1016/j.berh.2008.09.001. [DOI] [PubMed] [Google Scholar]
- 44.Ruaro B., Smith V., Sulli A., Decuman S., Pizzorni C., Cutolo M. Methods for the morphological and functional evaluation of microvascular damage in systemic sclerosis. Korean J. Intern. Med. 2015;30:1–5. doi: 10.3904/kjim.2015.30.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Dilken O., Ergin B., Ince C. Assessment of sublingual microcirculation in critically ill patients: consensus and debate. Ann. Transl. Med. 2020;8:793. doi: 10.21037/atm.2020.03.222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Kvernebo A.K., Miyamoto T., Sporastøyl A.H., Wikslund L.K., Måsøy S.E., Drolsum L., Moe M.C., Salerud G., Fukamachi K., Kvernebo K. Quantification of ocular surface microcirculation by computer assisted video microscopy and diffuse reflectance spectroscopy. Exp. Eye Res. 2020;201:108312. doi: 10.1016/j.exer.2020.108312. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kraemer R., Kabbani M., Sorg H., Herold C., Branski L., Vogt P.M., Knobloch K. Diabetes and peripheral arterial occlusive disease impair the cutaneous tissue oxygenation in dorsal hand microcirculation of elderly adults: implications for hand rejuvenation. Dermatol. Surg. 2012;38:1136–1142. doi: 10.1111/j.1524-4725.2012.02466.x. [DOI] [PubMed] [Google Scholar]
- 48.Hasegawa K., Pereira B.P., Pho R.W. The microvasculature of the nail bed, nail matrix, and nail fold of a normal human fingertip. J. Hand Surg. Am. 2001;26:283–290. doi: 10.1053/jhsu.2001.21519. [DOI] [PubMed] [Google Scholar]
- 49.S. User. [Online] https://www.dino-lite.eu/index.php/en/products/medical/capillaryscope
- 50.Video capillaroscopy. [Online] https://www.optiliamedical.eu/products/2/capillaroscope/11/Optilia%20Digital%20Capillaroscopy%20System,%20Extensive%20kit/
- 51.Digital capillaroscopy. 2020. https://www.inspect-is.com/capillaroscopypro/ [Online]
- 52.Smart g-scope™ europe - smart g-scope™ europe. 2021. https://g-scope.eu/ [Online]
- 53.Anders H.J., Sigl T., Schattenkirchner M. Differentiation between primary and secondary raynaud’s phenomenon: a prospective study comparing nailfold capillaroscopy using an ophthalmoscope or stereomicroscope. Ann. Rheum. Dis. 2001;60:407–409. doi: 10.1136/ard.60.4.407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Russakovsky O., Deng J., Su H., Krause J., Satheesh S., Ma S., Huang Z., Karpathy A., Khosla A., Bernstein M., et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015;115:211–252. [Google Scholar]
- 55.Zhou B., Lapedriza A., Khosla A., Oliva A., Torralba A. Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018;40:1452–1464. doi: 10.1109/TPAMI.2017.2723009. [DOI] [PubMed] [Google Scholar]
- 56.Kuznetsova A., Rom H., Alldrin N., Uijlings J., Krasin I., Pont-Tuset J., Kamali S., Popov S., Malloci M., Kolesnikov A., et al. The open images dataset v4. Int. J. Comput. Vis. 2020;128:1956–1981. [Google Scholar]
- 57.Jordan M.I., Mitchell T.M. Machine learning: trends, perspectives, and prospects. Science. 2015;349:255–260. doi: 10.1126/science.aaa8415. [DOI] [PubMed] [Google Scholar]
- 58.Zhang X.-D. In: A Matrix Algebra Approach to Artificial Intelligence. Chang C.L., editor. Springer; 2020. Machine learning; pp. 223–440. [Google Scholar]
- 59.Burkov A. The hundred-page machine learning book. Andriy Burkov Canada. 2019;1 [Google Scholar]
- 60.Mayo M. KDnuggets; 2018. Frameworks for Approaching the Machine Learning Process. [Google Scholar]
- 61.Willemink M.J., Koszek W.A., Hardell C., Wu J., Fleischmann D., Harvey H., Folio L.R., Summers R.M., Rubin D.L., Lungren M.P. Preparing medical imaging data for machine learning. Radiology. 2020;295:4–15. doi: 10.1148/radiol.2020192224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Ward L., Agrawal A., Choudhary A., Wolverton C. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2016;2:1–7. [Google Scholar]
- 63.Kohli M., Prevedello L.M., Filice R.W., Geis J.R. Implementing machine learning in radiology practice and research. AJR Am. J. Roentgenol. 2017;208:754–760. doi: 10.2214/AJR.16.17224. [DOI] [PubMed] [Google Scholar]
- 64.Caruana R., Niculescu-Mizil A. Proceedings of the 23rd International Conference on Machine Learning. MIT; 2006. An empirical comparison of supervised learning algorithms; pp. 161–168. [Google Scholar]
- 65.Dayan P., Sahani M., Deback G. Unsupervised learning. The MIT encyclopedia of the cognitive sciences. 1999:857–859. [Google Scholar]
- 66.Friedman J., Hastie T., Tibshirani R. Vol. 1. Springer Series in Statistics New York; 2001. (The Elements of Statistical Learning). [Google Scholar]
- 67.Bousquet O., von Luxburg U., Rätsch G. Vol. 3176. Springer; 2011. (Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2-14, 2003, Tübingen, Germany, August 4-16, 2003, Revised Lectures). [Google Scholar]
- 68.Sutton R.S., Barto A.G. MIT Press; 2018. Reinforcement Learning: An Introduction. [Google Scholar]
- 69.Szepesvári C. Algorithms for reinforcement learning. Synth. Lect. Artif. Intell. Mach. Learn. 2010;4:1–103. [Google Scholar]
- 70.Wiering M.A., Van Otterlo M. Reinforcement learning. Adapt. Learn. Optim. 2012;12 [Google Scholar]
- 71.Silver D., Hubert T., Schrittwieser J., Antonoglou I., Lai M., Guez A., Lanctot M., Sifre L., Kumaran D., Graepel T., et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science. 2018;362:1140–1144. doi: 10.1126/science.aar6404. [DOI] [PubMed] [Google Scholar]
- 72.Pan S.J., Yang Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010;22:1345–1359. [Google Scholar]
- 73.Weiss K., Khoshgoftaar T.M., Wang D. A survey of transfer learning. J. Big Data. 2016;3:9–40. [Google Scholar]
- 74.Torrey L., Shavlik J. Transfer learning. handbook of research on machine learning applications. IGI Global. 2009;3:17–35. [Google Scholar]
- 75.LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
- 76.Goodfellow I., Bengio Y., Courville A. MIT press; 2016. Deep Learning. [Google Scholar]
- 77.Shen D., Wu G., Suk H.-I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 2017;19:221–248. doi: 10.1146/annurev-bioeng-071516-044442. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Deng L., Yu D. Deep learning: methods and applications. FNT. in Signal Processing. 2014;7:197–387. [Google Scholar]
- 79.Medsker L.R., Jain L. Recurrent neural networks. Design and Applications. 2001;5:64–67. [Google Scholar]
- 80.Creswell A., White T., Dumoulin V., Arulkumaran K., Sengupta B., Bharath A.A. Generative adversarial networks: an overview. IEEE Signal Process. Mag. 2018;35:53–65. [Google Scholar]
- 81.O’Shea K., Nash R. An introduction to convolutional neural networks. arXiv. 2015 doi: 10.48550/arXiv.1511.08458. Preprint at. [DOI] [Google Scholar]
- 82.Mandic D., Chambers J. Wiley; 2001. Recurrent Neural Networks for Prediction: Learning Algorithms, Architectures and Stability. [Google Scholar]
- 83.Cheng J., Dong L., Lapata M. Long shortterm memory-networks for machine reading. arXiv. 2016 doi: 10.48550/arXiv.1601.06733. Preprint at. [DOI] [Google Scholar]
- 84.Dey R., Salem F.M. 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS) IEEE; 2017. Gate-variants of gated recurrent unit (gru) neural networks; pp. 1597–1600. [Google Scholar]
- 85.Chung J., Gulcehre C., Cho K., Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv. 2014 doi: 10.48550/arXiv.1412.3555. Preprint at. [DOI] [Google Scholar]
- 86.Pan Z., Yu W., Yi X., Khan A., Yuan F., Zheng Y. Recent progress on generative adversarial networks (gans): a survey. IEEE Access. 2019;7:36322–36333. [Google Scholar]
- 87.Saxena D., Cao J. Generative adversarial networks (gans) challenges, solutions, and future directions. ACM Comput. Surv. 2022;54:1–42. [Google Scholar]
- 88.Anwar S.M., Majid M., Qayyum A., Awais M., Alnowami M., Khan M.K. Medical image analysis using convolutional neural networks: a review. J. Med. Syst. 2018;42:1–13. doi: 10.1007/s10916-018-1088-1. [DOI] [PubMed] [Google Scholar]
- 89.Litjens G., Kooi T., Bejnordi B.E., Setio A.A.A., Ciompi F., Ghafoorian M., Van Der Laak J.A., Van Ginneken B., Sánchez C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017;42:60–88. doi: 10.1016/j.media.2017.07.005. [DOI] [PubMed] [Google Scholar]
- 90.Yamashita R., Nishio M., Do R.K.G., Togashi K. Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018;9:611–629. doi: 10.1007/s13244-018-0639-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.LeCun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition. Proc. IEEE. 1998;86:2278–2324. [Google Scholar]
- 92.El-Amir H., Hamdy M. Apress; 2019. Deep Learning Pipeline: Building a Deep Learning Model with TensorFlow. [Google Scholar]
- 93.Zhao Z.-Q., Zheng P., Xu S.-t., Wu X. Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 2019;30:3212–3232. doi: 10.1109/TNNLS.2018.2876865. [DOI] [PubMed] [Google Scholar]
- 94.Karpathy A. Stanford university cs231n: convolutional neural networks for visual recognition. 2018. http://cs231n.stanford.edu/syllabus.html
- 95.Yegnanarayana B. PHI Learning Pvt. Ltd.; 2009. Artificial Neural Networks. [Google Scholar]
- 96.Yiqiao Y. Columbia University; 2018. Deep Learning Notes. [Google Scholar]
- 97.Kim P. MATLAB deep learning. Springer; 2017. Convolutional neural network; pp. 121–147. [Google Scholar]
- 98.Krizhevsky A., Sutskever I., Hinton G.E. Imagenet classification with deep convolutional neuralvnetworks. Adv. Neural Inf. Process. Syst. 2012;25:1097–1105. [Google Scholar]
- 99.Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv. 2014 doi: 10.48550/arXiv.1409.1556. Preprint at. [DOI] [Google Scholar]
- 100.Szegedy C., Liu W., Jia Y., Sermanet P., Reed S., Anguelov D., Erhan D., Vanhoucke V., Rabinovich A. Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE; 2015. Going deeper with convolutions; pp. 1–9. [Google Scholar]
- 101.He K., Zhang X., Ren S., Sun J. Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE; 2016. Deep residual learning for image recognition; pp. 770–778. [Google Scholar]
- 102.Amit Y., Felzenszwalb P., Girshick R. Object detection. Computer Vision: A Reference Guide. 2020:1–9. [Google Scholar]
- 103.Liu L., Ouyang W., Wang X., Fieguth P., Chen J., Liu X., Pietikäinen M. Deep learning for generic object detection: a survey. Int. J. Comput. Vis. 2020;128:261–318. [Google Scholar]
- 104.Lowe D.G. Vol. 2. IEEE; 1999. Object recognition from local scaleinvariant features; pp. 1150–1157. (Proceedings of the Seventh IEEE international Conference on Computer Vision). [Google Scholar]
- 105.Alom M.Z., Taha T.M., Yakopcic C., Westberg S., Sidike P., Nasrin M.S., Van Esesn B.C., Awwal A.A.S., Asari V.K. The history began from alexnet: a comprehensive survey on deep learning approaches. arXiv. 2018 doi: 10.48550/arXiv.1803.01164. Preprint at. [DOI] [Google Scholar]
- 106.Everingham M., Van Gool L., Williams C.K.I., Winn J., Zisserman A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010;88:303–338. [Google Scholar]
- 107.Lin T.-Y., Maire M., Belongie S., Hays J., Perona P., Ramanan D., Doll′ar P., Zitnick C.L. European Conference on Computer Vision. Springer; 2014. Microsoft coco: common objects in context; pp. 740–755. [Google Scholar]
- 108.Hoiem D., Chodpathumwan Y., Dai Q. European Conference on Computer Vision. Springer; 2012. Diagnosing error in object detectors; pp. 340–353. [Google Scholar]
- 109.Girshick R., Donahue J., Darrell T., Malik J. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2014. Rich feature hierarchies for accurate object detection and semantic segmentation; pp. 580–587. [Google Scholar]
- 110.Uijlings J.R.R., van de Sande K.E.A., Gevers T., Smeulders A.W.M. Selective search for object recognition. Int. J. Comput. Vis. 2013;104:154–171. [Google Scholar]
- 111.Athiwaratkun B., Kang K. Feature Representation in Convolutional Neural Networks. arXiv. 2015 doi: 10.48550/arXiv.1507.02313. preprint. [DOI] [Google Scholar]
- 112.He K., Zhang X., Ren S., Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015;37:1904–1916. doi: 10.1109/TPAMI.2015.2389824. [DOI] [PubMed] [Google Scholar]
- 113.Lazebnik S., Schmid C., Ponce J. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) IEEE. Vol. 2. IEEE; 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories; pp. 2169–2178. [Google Scholar]
- 114.Girshick R. Proceedings of the IEEE International Conference on Computer Vision. IEEE; 2015. Fast r-cnn; pp. 1440–1448. [Google Scholar]
- 115.Xue J., Li J., Gong Y. Interspeech. ISCA; 2013. Restructuring of deep neural network acoustic models with singular value decomposition; pp. 2365–2369. [Google Scholar]
- 116.Ren S., He K., Girshick R., Sun J. Faster rcnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015;28:91–99. [Google Scholar]
- 117.Zitnick C.L., Dollár P. European Conference on Computer Vision. Springer; 2014. Edge boxes: locating object proposals from edges; pp. 391–405. [Google Scholar]
- 118.Dai J., Li Y., He K., Sun J. In: Advances in Neural Information Processing Systems. Lee D., Sugiyama M., Luxburg U., Guyon I., Garnett R., editors. MIT Press; 2016. R-fcn: object detection via region-based fully convolutional networks; pp. 379–387. [Google Scholar]
- 119.Lin T.-Y., Doll′ar P., Girshick R., He K., Hariharan B., Belongie S. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2017. Feature pyramid networks for object detection; pp. 2117–2125. [Google Scholar]
- 120.He K., Gkioxari G., Doll′ar P., Girshick R. Proceedings of the IEEE International Conference on Computer Vision. IEEE; 2017. Mask r-cnn; pp. 2961–2969. [Google Scholar]
- 121.Zhang X., Yang Y.-H., Han Z., Wang H., Gao C. Object class detection: a survey. ACM Comput. Surv. 2013;46:1–53. [Google Scholar]
- 122.Brahmbhatt S., Christensen H.I., Hays J. 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) IEEE; 2017. Stuffnet: using ‘stuff’to improve object detection; pp. 934–943. [Google Scholar]
- 123.Bell S., Zitnick C.L., Bala K., Girshick R. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2016. Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks; pp. 2874–2883. [Google Scholar]
- 124.Kong T., Yao A., Chen Y., Sun F. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2016. Hypernet: towards accurate region proposal generation and joint object detection; pp. 845–853. [Google Scholar]
- 125.Shrivastava A., Gupta A., Girshick R. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2016. Training region-based object detectors with online hard example mining; pp. 761–769. [Google Scholar]
- 126.Redmon J., Divvala S., Girshick R., Farhadi A. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2016. You only look once: unified, real-time object detection; pp. 779–788. [Google Scholar]
- 127.Redmon J., Farhadi A. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2017. Yolo9000: better, faster, stronger; pp. 7263–7271. [Google Scholar]
- 128.Redmon J., Farhadi A. Yolov3: an incremental improvement. arXiv. 2018 doi: 10.48550/arXiv.1804.02767. Preprint at. [DOI] [Google Scholar]
- 129.Hu D. Proceedings of SAI Intelligent Systems Conference. Springer; 2019. An introductory survey on attention mechanisms in nlp problems; pp. 432–448. [Google Scholar]
- 130.Lin T.-Y., Goyal P., Girshick R., He K., Doll′ar P. Proceedings of the IEEE international conference on computer vision. IEEE; 2017. Focal loss for dense object detection; pp. 2980–2988. [Google Scholar]
- 131.Law H., Deng J. Proceedings of the European conference on computer vision (ECCV) The Computer Vision Foundation; 2018. Cornernet: detecting objects as paired keypoints; pp. 734–750. [Google Scholar]
- 132.Lim B., Son S., Kim H., Nah S., Mu Lee K. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE; 2017. Enhanced deep residual networks for single image superresolution; pp. 136–144. [Google Scholar]
- 133.Shi W., Caballero J., Husz′ar F., Totz J., Aitken A.P., Bishop R., Rueckert D., Wang Z. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network; pp. 1874–1883. [Google Scholar]
- 134.Dong C., Loy C.C., Tang X. European Conference on Computer Vision. Springer; 2016. Accelerating the super-resolution convolutional neural network; pp. 391–407. [Google Scholar]
- 135.Lai W.-S., Huang J.-B., Ahuja N., Yang M.-H. Fast and accurate image super-resolution with deep laplacian pyramid networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019;41:2599–2613. doi: 10.1109/TPAMI.2018.2865304. [DOI] [PubMed] [Google Scholar]
- 136.Dufour R.M., Miller E.L., Galatsanos N.P. Template matching based object recognition with unknown geometric parameters. IEEE Trans. Image Process. 2002;11:1385–1396. doi: 10.1109/TIP.2002.806245. [DOI] [PubMed] [Google Scholar]
- 137.Greig D.W., Denny M. In: Posa F., editor. Vol. 4883. The International Society for Optics and Photonics; 2003. Knowledge-based methods for small-object detection in sar images; pp. 121–130. (SAR Image Analysis, Modeling, and Techniques V, International Society for Optics and Photonics). [Google Scholar]
- 138.Hossain M.D., Chen D. Segmentation for objectbased image analysis (obia): a review of algorithms and challenges from remote sensing perspective. ISPRS J. Photogrammetry Remote Sens. 2019;150:115–134. [Google Scholar]
- 139.Jodoin P.M., Jodoin P.-M., Emile B., Laurent H., Rosenberger C. Comparative study of background subtraction algorithms. J. Electron. Imaging. 2010;19:033003. [Google Scholar]
- 140.KaewTraKulPong P., Bowden R. In: Video-Based Surveillance Systems. Remagnino P., Jones G.A., Paragios N., Regazzoni C.S., editors. Springer; 2002. An improved adaptive background mixture model for real-time tracking with shadow detection; pp. 135–144. [Google Scholar]
- 141.Zivkovic Z. Vol. 2. IEEE; 2004. Improved adaptive Gaussian mixture model for background subtraction; pp. 28–31. (Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004). [Google Scholar]
- 142.Zivkovic Z., Van Der Heijden F. Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recogn. Lett. 2006;27:773–780. [Google Scholar]
- 143.Li L., Huang W., Gu I.Y.-H., Tian Q. Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 2004;13:1459–1472. doi: 10.1109/tip.2004.836169. [DOI] [PubMed] [Google Scholar]
- 144.Marcomini L., Cunha A.L. A comparison between background modelling methods for vehicle segmentation in highway traffic videos. arXiv. 2018 doi: 10.48550/arXiv.1810.02835. Preprint at. [DOI] [Google Scholar]
- 145.Sankur B., Sankur B. Survey over image thresholding techniques and quantitative performance evaluation. J. Electron. Imaging. 2004;13:146–165. [Google Scholar]
- 146.Bradley D., Roth G. Adaptive thresholding using the integral image. J. Graph. Tool. 2007;12:13–21. [Google Scholar]
- 147.Yousefi J. University of Guelph; 2011. Image Binarization Using Otsu Thresholding Algorithm. [Google Scholar]
- 148.Bradski G., Kaehler A. O’Reilly Media, Inc.; 2008. Learning OpenCV: Computer Vision with the OpenCV Library. [Google Scholar]
- 149.McIlhagga W. The canny edge detector revisited. Int. J. Comput. Vis. 2011;91:251–261. [Google Scholar]
- 150.Sherstyuk A. Kernel functions in convolution surfaces: a comparative analysis. Vis. Comput. 1999;15:171–182. [Google Scholar]
- 151.Kaur M., Kaur J., Kaur J. Survey of contrast enhancement techniques based on histogram equalization. Int. J. Adv. Comput. Sci. Appl. 2011;2 [Google Scholar]
- 152.Motwani M.C., Gadiya M.C., Motwani R.C., Harris F.C. Survey of image denoising techniques. Proceedings of GSPX. 2004;27:27–30. [Google Scholar]
- 153.Dobbe J.G.G., Streekstra G.J., Atasever B., Van Zijderveld R., Ince C. Measurement of functional microcirculatory geometry and velocity distributions using automated image analysis. Med. Biol. Eng. Comput. 2008;46:659–670. doi: 10.1007/s11517-008-0349-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Deng H., Zhang W., Mortensen E., Dietterich T., Shapiro L. 2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE; 2007. Principal curvature-based region detector for object recognition; pp. 1–8. [Google Scholar]
- 155.Bezemer R., Dobbe J.G., Bartels S.A., Boerma E.C., Christiaan Boerma E., Elbers P.W.G., Heger M., Ince C. Rapid automatic assessment of microvascular density in sidestream dark field images. Med. Biol. Eng. Comput. 2011;49:1269–1278. doi: 10.1007/s11517-011-0824-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Tam J., Martin J.A., Roorda A. Noninvasive visualization and analysis of parafoveal capillaries in humans. Invest. Ophthalmol. Vis. Sci. 2010;51:1691–1698. doi: 10.1167/iovs.09-4483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 157.Geyman L.S., Garg R.A., Suwan Y., Trivedi V., Krawitz B.D., Mo S., Pinhas A., Tantraworasin A., Chui T.Y.P., Ritch R., Rosen R.B. Peripapillary perfused capillary density in primary open-angle glaucoma across disease stage: an optical coherence tomography angiography study. Br. J. Ophthalmol. 2017;101:1261–1268. doi: 10.1136/bjophthalmol-2016-309642. [DOI] [PubMed] [Google Scholar]
- 158.Demir S.U., Hakimzadeh R., Hargraves R.H., Ward K.R., Myer E.V., Najarian K. An automated method for analysis of microcirculation videos for accurate assessment of tissue perfusion. BMC Med. Imaging. 2012;12 doi: 10.1186/1471-2342-12-37. 37–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Cheng C., Lee C.W., Daskalakis C. A reproducible computerized method for quantitation of capillary density using nailfold capillaroscopy. J. Vis. Exp. 2015;104:e53088. doi: 10.3791/53088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Dai G., He W., Xu L., Pazo E.E., Lin T., Liu S., Zhang C. Exploring the effect of hypertension on retinal microvasculature using deep learning on east asian population. PLoS One. 2020;15:e0230111. doi: 10.1371/journal.pone.0230111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Hariyani Y.S., Eom H., Park C. Da-capnet: dual attention deep learning based on u-net for nailfold capillary segmentation. IEEE Access. 2020;8:10543–10553. [Google Scholar]
- 162.Hu J., Shen L., Sun G. Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE; 2018. Squeeze-and-excitation networks; pp. 7132–7141. [Google Scholar]
- 163.Woo S., Park J., Lee J.-Y., Kweon I.S. Proceedings of the European conference on computer vision (ECCV) The Computer Vision Foundation; 2018. Cbam: convolutional block attention module; pp. 3–19. [Google Scholar]
- 164.Prentašic P., Heisler M., Mammo Z., Lee S., Merkur A., Navajas E., Beg M.F., Šarunic M., Lončarić S. Segmentation of the foveal microvasculature using deep learning networks. J. Biomed. Opt. 2016;21:075008. doi: 10.1117/1.JBO.21.7.075008. [DOI] [PubMed] [Google Scholar]
- 165.Nivedha R., Brinda M., Suma K., Rao B. 2016 International Conference on Circuits, Controls, Communications and Computing (I4C) IEEE; 2016. Classification of nailfold capillary images in patients with hypertension using non-linear svm; pp. 1–5. [Google Scholar]
- 166.Javia P., Rana A., Shapiro N., Shah P. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) IEEE; 2018. Machine learning algorithms for classification of microcirculation images from septic and non-septic patients; pp. 607–611. [Google Scholar]
- 167.Tama A., Mengko T.R., Zakaria H. 2015 4th International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering (ICICI-BME) IEEE; 2015. Nailfold capillaroscopy image processing for morphological parameters measurement; pp. 175–179. [Google Scholar]
- 168.Ye F., Yin S., Li M., Li Y., Zhong J. Invivo full-field measurement of microcirculatory blood flow velocity based on intelligent object identification. J. Biomed. Opt. 2020;25:016003. doi: 10.1117/1.JBO.25.1.016003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 169.Abdou M.A.H., Ferreira P., Jul E., Truong T.T. Capillaryx: a software design pattern for analyzing medical images in real-time using deep learning. arXiv. 2022 doi: 10.48550/arXiv.2204.08462. Preprint at. [DOI] [Google Scholar]


