Skip to main content
Journal of Experimental Botany logoLink to Journal of Experimental Botany
. 2024 Oct 4;75(21):6683–6703. doi: 10.1093/jxb/erae395

Machine learning-enabled computer vision for plant phenotyping: a primer on AI/ML and a case study on stomatal patterning

Grace D Tan 1,2, Ushasi Chaudhuri 3, Sebastian Varela 4,5, Narendra Ahuja 6,7, Andrew D B Leakey 8,9,10,11,
Editor: Tracy Lawson12
PMCID: PMC11565210  PMID: 39363775

Abstract

Artificial intelligence and machine learning (AI/ML) can be used to automatically analyze large image datasets. One valuable application of this approach is estimation of plant trait data contained within images. Here we review 39 papers that describe the development and/or application of such models for estimation of stomatal traits from epidermal micrographs. In doing so, we hope to provide plant biologists with a foundational understanding of AI/ML and summarize the current capabilities and limitations of published tools. While most models show human-level performance for stomatal density (SD) quantification at superhuman speed, they are often likely to be limited in how broadly they can be applied across phenotypic diversity associated with genetic, environmental, or developmental variation. Other models can make predictions across greater phenotypic diversity and/or additional stomatal/epidermal traits, but require significantly greater time investment to generate ground-truth data. We discuss the challenges and opportunities presented by AI/ML-enabled computer vision analysis, and make recommendations for future work to advance accelerated stomatal phenotyping.

Keywords: Artificial intelligence, computer vision, machine learning, object detection, plant biology, plant phenotyping, segmentation, stomata, stomatal density


We present a primer on AI/ML for plant biologists and a review of the use of machine learning for automated analysis of images for rapid phenotyping of stomatal patterning.

Introduction

Artificial intelligence and machine learning (AI/ML) can be leveraged to improve many tools used in biological research (Wang et al., 2023). One key application of AI/ML is in computer vision tools that can extract visual features from images to estimate plant traits (Grinblat et al., 2016; Mohanty et al., 2016). Examples of such tools range in application across biological scales of organization from the molecular to cellular, tissue, organ, plant, canopy, and ecosystem (Bai et al., 2018; Lee et al., 2019; Zhou et al., 2019; Stringer et al., 2021; Ott and Lautenschlager, 2022; Song and Wang, 2023). Stomata are microscopic pores in the plant epidermis that regulate the exchange of carbon dioxide and water vapor between the atmosphere and internal tissues (Willmer and Fricker, 1996). Stomata therefore exert considerable influence on carbon and water fluxes of plants, ecosystems, and the earth system (Hetherington and Woodward, 2003), while also regulating water use efficiency (Leakey et al., 2019), temperature (Chaves et al., 2003), pathogen entry (Melotto et al., 2008), and uptake of air pollutants (Ainsworth et al., 2008). Stomatal conductance describes the extent to which stomata allow diffusion of gases across the epidermis, and is determined by the pattern of stomatal complexes plus the aperture of the stomatal pores (Franks et al., 2009). Stomatal aperture is itself a function of the size of the stomatal complex and the degree to which the guard cells, and subsidiary cells in certain species, alter shape to open the pore (Franks and Farquhar, 2007). We define the elements of stomatal patterning as the number, size, and relative positions of stomatal complexes in the epidermis. There is significant interest in understanding the structure–function relationships of stomata. Due to their functional significance, and their convenient location on the plant epidermis, stomata are also model systems for the study of signal transduction and cell fate/development (Pillitteri and Torii, 2012).

Stomata are found with varying degrees of regularity embedded in a matrix of epidermal cells. More than five cell classes are often present including different combinations of pavement cells of various shapes, prickles, microhairs, macrohairs, silica cells, bulliform cells, and stomatal complexes (Freeling, 1992; Ellison et al., 2020). Stomata and other epidermal cell types vary greatly in number and appearance among and within species, as well as in response to environmental variation (Braybrook and Kuhlemeier, 2010; Vőfély et al., 2019). Even examining a portion of the biological diversity in epidermal patterning reveals significant variation in size, density, cell morphology, and distribution across the epidermis (Fig. 1). Most notably the guard cells that surround the pore of the stomata in dicotyledonous species are ‘kidney-shaped’ and sometimes accompanied by varying numbers and shapes of subsidiary cells (Fig. 1A, B, D, F, G). Guard cells that surround the pore of the stomata in monocotyledonous species are ‘dumbbell-shaped’ and are always accompanied by a pair of subsidiary cells (Fig. 1C, E, H). Because stomata include multiple structures, they are referred to as a ‘stomatal complex’ to clarify the identification of the entire unit including the pore, a pair of guard cells, and subsidiary cells if they are present. When referring to the reviewed papers, we use this terminology to distinguish measurements such as stomatal complex area (SCA), namely the measurement of the area of the whole structure including the pore and guard/subsidiary cells, from measurements such as pore size that does not include the area of the cells that surround pores.

Fig. 1.

Fig. 1.

Biological diversity of plant stomata. (A) Aglaia cucullata, tropical tree species (reprinted from Dey et al., 2023, with permission from Elsevier). (B) Arabidopsis thaliana (reprinted from Li et al., 2022, by permission of the American Society of Plant Biologist). (C) Zea mays (Aono et al., 2021). (D) Soybean (Sultana et al., 2021). (E) Oil palm (Kwong et al., 2021). (F) Quinoa (Razzaq et al., 2021). (G) Example of an open dicotyledonous stomatal complex. (H) Example of an open monocotyledonous stomatal complex. gc=guard cell (green), sc=subsidiary cell (blue), a=aperture.

This paper aims to review recent progress in phenotyping stomatal patterning, with a focus on application of ML to automate assessment of stomatal patterning traits from microscope images. This area of research has emerged and advanced quickly over the last decade. It is an important development because a wide range of biological questions can be better addressed if we can rapidly and accurately assess stomatal density, size, shape, and aperture. The search terms ‘stoma* or guard cell’ and ‘machine learn* or artificial intelligence’ were applied in the online Web of Science™ tool to search for relevant literature. Initial search results, plus relevant papers that either cited those studies or were cited by them, generated a set of 39 papers for review. This includes work on a wide variety of species (Fig. 2A), data collection approaches (Fig. 2B, C), and traits (Fig. 2D). This review is structured to discuss phenotyping stomatal patterning in terms of: (i) methods for tissue sampling, microscopy, and image analysis; (ii) a primer on AI/ML; (iii) the biological context of current ML tools; (iv) recently developed computer vision tools; (v) common challenges to ML-enabled tool; and (vi) future directions.

Fig. 2.

Fig. 2.

Summary of the 39 papers reviewed. (A) Histogram of the 56 unique species used for model training. The number in parentheses represents the number of species in that category. (B) Pie chart of sampling techniques. The number in parentheses represents the number of papers in that category. (C) Pie chart of imaging methods. The number in parentheses represents the number of papers in that category. (D) Histogram of trait output. SD, stomatal density; SCL, stomatal complex length; SCW, stomatal complex width; SCA, stomatal complex area; SI, stomatal index; PCD, pavement cell density; PCA, pavement cell area; PCL, pavement cell length; PCW, pavement cell width.

Methods for tissue sampling, microscopy, and image analysis of stomatal patterning

Tissue sampling

Quantification of stomatal traits can be time-consuming and difficult at a large experimental scale. The most common sample preparation method, utilized in 17 of the reviewed papers, involves making an impression of the epidermal surface by varnish peel, glue peel, or other method (Fig. 2B). In this methodology, it is usually the inverse of the leaf surface at the time the impression material is applied that is captured and analyzed. This method is non-destructive and produces replicas of the epidermis that can be stored and easily preserved long term. However, this method requires careful manual manipulation and is susceptible to error in the process of applying or removing the impression material, which can reduce image quality (Jayakody et al., 2017; Costa et al., 2021; Dey et al., 2023). This method may also be incompatible with the epidermal topography of some species. Notably, gymnosperms often have stomata sunken into crypt-like depressions, which may not be captured by varnish peels (Fetter et al., 2019), and some species have particularly hairy leaves that obscure the underlying cells of interest (Meeus et al., 2020). Only slightly less commonly, intact leaf samples are imaged directly, either after being removed from the plant or still attached. Avoiding time spent on tissue or sample preparation can make these methods rapid choices. Imaging these samples may encounter the same difficulties as impressions when structures are occluded from view by hairs, but pre-processing to remove hairs may allow for imaging. Another methodology is epidermal peels, whereby one side of the epidermis and the mesophyll is carefully pulled or scraped away to leave an isolated epidermis for imaging. Lastly, some papers employ a clearing treatment to make a leaf sample transparent, but it can take hours to days to fully process the samples (Sultana et al., 2021). ‘Cleared leaf sample’ represents a distinct category from ‘leaf sample’ in both preparation time and compatibility with imaging methods. On the whole, these common sampling techniques have changed very little for over a century (Biscoe, 1872).

Microscopy methods

The various forms of light microscopy have been more commonly used (42 studies) than electron microscopy (7 studies) to image stomata and surrounding epidermis (Fig. 2C). Light microscopy included bright-field, confocal, fluorescence, dark-field, and differential interference contrast (DIC) microscopy methods. When the specific imaging modality was unclear, but a light microscope was named, papers were assigned to the ‘light (unspecified)’ category. Light microscopes capable of capturing bright-field images can be sufficient to image varnish peels, epidermal peels, or cleared leaf samples, but can be limited in imaging leaf samples directly. Optical tomography is a subset of confocal microscopy that has recently been introduced as a method to rapidly scan the leaf surface without the need for any sample preparation methods beyond sticking the sample to a microscope slide with double-sided tape (Haus et al., 2015; Ferguson et al., 2021, 2024; Prakash et al., 2021; Xie et al., 2021; Lunn et al., 2024). Although the instrument is relatively expensive, this methodology is promising because it is compatible with the fastest sampling method (i.e. direct measurement of leaves without the need for time-consuming tissue preparation), is cheap to operate, rapid, and does not destroy the sample imaged. SEM is even more expensive in terms of equipment cost, but achieves greater resolution than all light microscopy methods when precision is needed.

Image analysis methods

Phenotyping stomatal traits from micrographs has historically relied heavily upon manual data collection. Manual measurement of images to quantify stomatal density, aperture, and size aided by software such as ImageJ remained the predominant method even as classical non-ML computer vision approaches made semi-automated measurements possible (Karabourniotis, 2001; Sanyal et al., 2008; Laga et al., 2014). A coordinate value for each stoma in an image will allow counting to determine stomatal density and positional information to assess spatial arrangement. This has proven to be a burdensome but tractable task in many cases. However, additional manual annotation to assess the length, width, and area of the entire stomatal complex (SCL, SCW, and SCA) or individual guard cells/subsidiary cells (Fig. 2) adds labor that is impractical for large numbers of samples (Xie et al., 2021). Assessment of cell size for epidermal cells adds an additional order of magnitude of work, making automated analysis essential in almost all cases (Xie et al., 2021). As described below, the emergence of AI/ML methods has been transformational in allowing automated analysis of large image sets.

A primer on AI/ML

Here we aim to introduce AI/ML methods, in general, to biologists before discussing the specific AI/ML methods that have been applied to automating the measurement of stomatal traits in epidermal micrographs. AI refers to the process of imparting aspects of human intelligence to machines so that they can mimic human behavior for problem-solving and decision-making. ML is a subfield of AI wherein such decision-making is represented as statistical functions, and the decision-making is learned by training on real data, also known as ground-truth (Shalev-Shwartz and Ben-David, 2014; Goodfellow et al., 2016). ML algorithms can be broadly categorized into supervised and unsupervised learning algorithms by whether they are trained on labeled or raw data, respectively. An example of a labeled image would be a raw image of a leaf epidermis on which the perimeters of all stomatal complexes have been drawn on by a human. The papers reviewed here exclusively feature supervised algorithms, as no unsupervised models for stomatal phenotyping exist yet to our knowledge. This primer progresses through: (i) describing convolutional neural network models as image analysis tools; (ii) classification, object detection, semantic segmentation, and instance segmentation as distinct image analysis tasks; (iii) the steps involved in training, validation, and testing of models; (iv) different model learning strategies (training from scratch and transfer learning); and finally (v) the metrics used to assess model performance.

Neural networks

The use of ML for determining stomatal traits often involves neural network (NN) modeling. NNs consist of interconnected layers of nodes (neurons) inspired by the human brain’s structure and function, allowing them to learn to recognize patterns in data. They are used for a variety of tasks, including classification, regression, and clustering. However, a specialized type of feedforward NNs known as convolutional neural networks (CNNs) has been particularly successful in processing and analyzing grid-like data, such as images for analysis of stomata. A CNN takes an image as its input, processes it through various layers, and outputs a prediction that corresponds to the data specified in the training set. CNNs have been fundamental components to solve computer vision tasks such as object detection and segmentation (i.e. partitioning an image into discrete groups of cells or classes of cells). They comprise multiple layers, each with trainable parameters including the following.

  • (i) Convolutional layers that utilize filters and kernels to generate a more abstract representation through a feature map. The filter moves across the image like a scanner and creates a feature map (i.e. meaningful descriptor that links input and output prediction). This is the core and distinctive component of CNNs. This operation is key to the network’s ability to automatically extract spatial hierarchies and features, such as edges, textures, and shapes, from images.

  • (ii) Pooling layers down-sample feature maps by summarizing the presence of features in patches of the feature map. This reduces the data’s dimensionality and the computational cost.

  • (iii) Fully connected layers link neurons in one layer to neurons in another layer. They take outputs from other layers and classify pixels, calculating scores for each class label.

Supervised training of a CNN involves using a labeled dataset to adjust the network’s parameters (weights and biases) through a process called backpropagation. The training data are used to compute the ‘loss’, which measures the difference between the predicted output and the true labels. The loss is minimized by updating the parameters using optimization algorithms. Additionally, validation data are used during training to tune hyperparameters, such as the learning rate, and to monitor the model’s performance. This helps in preventing overfitting and ensures that the model generalizes well to unseen data. By iteratively adjusting the parameters and hyperparameters, CNNs become highly effective at accurately analyzing and interpreting image data. Deep learning (DL) is a subfield of ML that uses particularly deep and large NNs.

The major reason for the success of NN-based learning techniques is the ability to learn from the data themselves. After analyzing large amounts of data, learned models can automatically identify important features/aspects of the data which can help in formulating new hypotheses and finding common patterns that can be extended over larger datasets, which otherwise would have been unrealistic to do manually.

Classification, object detection, semantic segmentation, and instance segmentation

Four common tasks in ML for image analysis are classification, object detection, semantic segmentation, and instance segmentation. At the scale of many biological experiments, any one of these tasks can take a significant time to accomplish manually.

Classification describes the process of grouping images or parts of images into a set of classes defined by the user. A single image can be associated with either one class label or multiple class labels, depending on the task at hand. In stomatal research, classification is used to identify plant species based on an image of the leaf epidermis (Andayani et al., 2020; Dey et al., 2023), to identify cropped image regions as containing or lacking stomata (Aono et al., 2021), or to classify stomata identified by object detection as in the open-pore or closed-pore states (Razzaq et al., 2021; Li et al., 2023).

Object detection identifies occurrences of unique instances of an object and places a bounding box (the minimum rectangle needed to contain the object) around each instance. Each instance is assigned a distinct identifier under its respective class label. Semantic segmentation involves the grouping of individual pixels into user-defined classes but, crucially, semantic segmentation alone does not differentiate between any two unique instances of these objects within a class.

Instance segmentation can be conceptualized as a combination of object detection and semantic segmentation, identifying all the pixels belonging to unique instances of an object in an image. Instance segmentation of stomata can be achieved by semantic segmentation of an entire image, followed by a post-processing step to label continuous components with unique instance identifiers. This approach is relatively straightforward because stomata rarely lie directly adjacent to each other. Instance segmentation can also be achieved by semantic segmentation of objects in bounding boxes identified by object detection. This form of instance segmentation is likely to be less error prone than using semantic segmentation output to identify instances by counting non-connecting regions. Object detection, semantic segmentation, and instance segmentation tactics are employed in various ways in the reviewed papers to estimate quantitative traits such as the size or number of structures.

Training, validation, and testing of ML models

A simplified schematic of the process of developing an ML model for object detection is depicted in Fig. 3. After sample collection and image acquisition, the entire dataset is typically divided into three sub-datasets, namely the ‘training dataset’, ‘validation dataset’, and the ‘test dataset’ (pink, blue, and yellow squares, respectively, in Fig. 3). Division of images among the subsets is most commonly done by random sampling. However, if prior information is available about the distribution of the trait of interest, structured sampling may be performed to ensure that the distribution of data is similar across the three sub-datasets. All three datasets require human measurements of the desired trait. While there is inconsistency in the literature, in this review ‘training dataset’ will refer to images utilized in the training process to initially learn features of interest. ‘Validation dataset’ will refer to images utilized in the training process to evaluate model performance at successive iterations. In the process of training, the model first learns from the training dataset, then adjusts parameters according to evaluation on the validation dataset to improve performance on the training dataset without simply memorizing its features. Training is an iterative process consisting of a number of epochs, where a single epoch involves prediction, evaluation, and parameter adjustment (gray rectangular box in Fig. 3). Training is complete when the losses (discrepancy between predictions and ground-truth) from the training and validation datasets converge, indicating optimal model parameters. The training dataset is always larger than the validation dataset, but the ratio of images between them ranges dramatically from 2:1 to 8:1 (training:validation) in the reviewed papers. ‘Test dataset’ will refer to images not used in the training process (i.e. novel images not contained in the training or validation datasets), on which the selected final model is tested by comparison with manual data. In theory, all three of these datasets display the same data distribution, but it is common to deploy the model on additional test datasets to evaluate model performance on images that increasingly differ from images in the training and validation datasets. It is important to note that some reviewed papers do not clearly distinguish the ‘test dataset’ from the ‘validation dataset’.

Fig. 3.

Fig. 3.

Workflow illustrating the process of building a machine learning model. Plant material is sampled and imaged using microscopy. These images are divided into training, validation, and test datasets. Ground-truth data are manually generated for all three datasets. Optional filtering is applied to all images to enhance image quality, and data augmentation may also be utilized to enrich the training dataset. Training involves the cycle of the model, making predictions on training and validation datasets, assessing performance through comparison with ground-truth data, and adjustment of the model based on results from only the validation dataset, constituting one epoch. Training is terminated after many epochs, and the finalized model can be applied to a test dataset and assessed for performance.

Data augmentation

Data augmentation describes the process of applying transformations to model training data to increase the size of the training dataset and to improve performance on visual diversity. Two categories of data augmentations are described in Casado-García et al. (2019). Position-invariant techniques include altering the color, brightness, or contrast of training images, and can reuse existing manual ground-truth data for both class and object locations/boundaries. Position-variant data augmentation techniques such as rotations, crops, and reflections can use existing manual ground-truth class data, but require any manual ground-truth data describing object locations/boundaries to be adjusted to reflect the new position of the object in the image. Sixteen reviewed papers utilize data augmentation to enhance training datasets, and both position-variant and -invariant techniques are well represented (Supplementary Table S1). Most commonly researchers apply rotations, crops, blurring, or changes to color, but data augmentation methods are not always specifically described.

Model learning strategies

When learning from visual data, two primary training strategies are employed: training from scratch and transfer learning, including fine-tuning. Training from scratch involves initializing the NN with random weights and training it on a large, labeled dataset, allowing the network to learn features directly from the data. Alternatively, transfer learning leverages a pre-trained model, which has already learned useful features from a large dataset, and adapts it to the new task. Fine-tuning takes this a step further by training the pre-trained model on the new dataset, slightly updating some or all of its layers’ parameters (i.e. weights) to better capture the specific characteristics of the new data. These strategies help to efficiently utilize computational resources and improve model performance, especially when dealing with limited labeled data.

Performance metrics

There are a number of metrics that can be used to assess model performance. It is important to note that these metrics have very specific definitions in the context of ML performance. Some commonly used metrics for object detection include precision, recall, and F1-score, depicted visually in Fig. 4. Terms used to define these performance metrics include TP=true positive, TN=true negative, FP=false positive, and FN=false negative. For these metrics, values range from 0 to 1, with a value nearer 1 representing excellent performance, or nearer 100% if expressed as a percentage.

Fig. 4.

Fig. 4.

Recall and precision in object detection. Gray ovals indicate the presence of a single stoma identified in the ground-truth data. Yellow circles indicate model predictions for the location of stomata. Blue rectangles represent false-positive predictions, where there is a model prediction without a ground-truth object. Pink rectangles represent false-negative predictions, where there is no model prediction on a ground-truth object. Stomata with yellow circles and no rectangles indicate true positives. (A) Example performance of a model with high precision and high recall. (B) Example performance of a model with low precision and high recall. (C) Example performance of a model with high precision and low recall. (D) Example performance of a model with low precision and low recall.

Precision, recall, and F1-score

Precision is defined by the proportion of identified objects that are correctly identified, with a high value indicating few false positives. It is given as:

Precision=TP/(TP+FP)

Recall is defined by the proportion of ground-truth objects that are correctly identified, with a high value indicating few false negatives. It is given as:

Recall=TP/(TP+FN)

F1-score synthesizes precision and recall to assign a summary value to performance that considers both false positives and false negatives.

F1-score=(2×Precision×Recall)/(Precision+Recall)

Average precision (AP) is another commonly used metric that describes the area under a precision–recall curve and thus represents performance relative to both false positives and false negatives.

Accuracy

Accuracy is defined by how often the model makes correct predictions, and also utilizes both both false positives and false negatives. It is defined as:

(Correct  predictions)/(Total  predictions)  =(TP+TN)/(TP+TN+FP+FN)

Intersection over union

Intersection over union (IoU) evaluates the overlap between the bounding boxes of the predicted output and the ground-truth output. This provides an insight if a detection is valid (true positive) or not (false positive). It also helps in quantifying the alignment and offset between the predictions and the ground-truth.

IoU=Area  of  overlapArea  of  union

Mean average precision

A dataset that contains classes of data where one class makes up the great majority (i.e. >80–90%) of the total dataset, is referred to as imbalanced. In this instance, classifying all the predictions as a single majority class would also lead to a high IoU value. This would be deceptive as a quantitative measure. Mean average precision (mAP) provides an aggregate measure of how well the model performs across all object classes. Calculating this metric involves computing the average precision by determining the area under the precision–recall curve for each class. Finally, the mean of AP values across all classes is obtained.

While the most frequently utilized performance metrics are defined above, other common statistics such as error rate and correlation coefficient also appear. Each performance metric speaks to different abilities of any given model, and reporting only one available metric can obscure the nuance of model performance. A model with extremely high precision may rarely identify a non-stomatal object as stomata; however, without accuracy or recall, we know nothing about how frequently it misses stomata (Fig. 4C). A model with perfect recall but low precision may never miss stomata, but the false positives mean that the derived stomatal density (SD) would be drastically overestimated (Fig. 4B). Additionally, some of these values are only appropriate for certain tasks. For example, performance of object detection of stomata or pavement cells can be quantified by any metric that utilizes false/true positive/negative values, while performance on semantic segmentation of pore length cannot use these values, and is often described better with the coefficient of determination (R2) and error rate between the model value and the ground-truth value.

It is important to consider that even the same metric reported for two different models may not be directly comparable if they are not applied to the same dataset. The number of images used to test model performance varies among papers, and in some cases this number is not clearly stated. Furthermore, even if metrics were reported with the same number of images, depending on the range of stomatal densities, these values may reflect a vastly different number of instances. More broadly, the make up of the dataset from which these values are calculated is often not clear. It is important that testing be done on images that are completely independent from the training process. Strong performance across many images that represent a wider diversity of appearances is obviously more valuable. For these various reasons, it is not easy to draw strong conclusions about which methods/approaches perform best just by comparing performance metrics, even across studies that share methods for data collection or analysis.

The biological context of current ML tools for phenotyping stomatal patterning

To date, at least 39 studies (Supplementary Table S1) have explored how data analysis can be accelerated by applying ML tools to microscopic images of the leaf epidermis for rapid phenotyping. Before discussing the specific ML methods and their performance, it is important to note the context within which they have been applied. This is necessary because AI/ML methods are typically highly context dependent; that is, the models need to be trained with data, often in large amounts, and their ability to perform a task diminishes rapidly if they are then provided with test data that fall out of the range of what was used for training. New contexts where an existing tool might fail include samples from new sampling approaches, new imaging modalities, new genotypes, new species, new environments, or new developmental stages, namely any factor that alters the appearance of the epidermis in the image. The consequences of this are very significant given the propensity for variation in biological systems.

Study species

To date, application of AI/ML tools to analyze images of stomata has largely focused on a modest number of plant species. The frequency with which 57 specifically named species appear in a model’s training dataset, with an additional category for models that use a non-specific epidermal dataset of many species, is shown in Fig. 2A. It is important to note, though, that even if there is a model for a researcher’s species of interest, the published model may not collect the trait data of interest and/or the sampling methods might not be compatible with the current researcher’s available resources. Of the 39 papers reviewed here, the greatest number focused on the study of maize (six papers), wheat (five papers), balsam poplar (four papers), Arabidopsis (three papers), soybean (three papers), and ginkgo (three papers).There does not appear to be a focus on a particular type of stomata, but rather there is the broader tendency to focus research on species that are model systems and/or economically important.

Tissue sampling strategy

Even within monocots sharing common stomatal complex morphology, the sampling strategy can have a significant effect on the appearance of the image and, by extension, the degree to which a tool can identify structures accurately, as shown in Fig. 5. Images in Fig. 5A–C are all of maize, but their differences in sampling and imaging methods make it likely that a model trained on one would fail on the others. This is due, in part, to the fact that these methods are often chosen to facilitate the quantification of a particular trait. For example, Liang et al. (2022) utilized portable microscopes to image maize leaves in situ for pore size (Fig. 5B), which could not be calculated from the closed stomata in confocal images of leaf samples from Saponaro et al. (2017) (Fig. 5A).

Fig. 5.

Fig. 5.

Grass images acquired through different sampling and imaging methods. (A) Maize leaf sample imaged by a confocal microscope (reprinted from Saponaro et al., 2017, with permission from IEEE Proceedings). (B) Maize live leaf imaged by a portable light microscope (Liang et al., 2022). (C) Maize varnish peel imaged by a light microscope (reprinted from Zhang et al., 2022, with permission from Elsevier). (D) Rice leaf sample imaged by SEM (reprinted from Bhugra et al., 2018, with permission from Proceedings of the Institute of Electrical and Electronics Engineers).

Phenotyping stomatal density, size, conductance, and other epidermal cells

SD is the simplest and most frequent trait collected in the reviewed papers, with 33 of the 39 reviewed papers describing models that do so (Fig. 2D). SD is usually calculated by counting the number of stomata a model predicts based on object detection or instance segmentation. It is important to note that accurate estimates of SD are not achieved if stomata lying partially in the image are not dealt with correctly. A simple and common approach is to only count stomata lying on one of the two vertical edges and one of the two horizontal edges. However, papers with a strong image analysis focus sometimes do not implement this approach. Fortunately, the correction can be applied after ML/AI operations are complete. However, the AI/ML tool does need to be trained to identify partial stomata on image edges if estimates of SD are desired. If the size of stomatal complexes is the focus, then partial stomatal complexes on the edge of images are ignored.

The length and width of bounding boxes or an ellipse fit to match the perimeter of a stomatal complex can be used to estimate SCL and SCW, but to be reliable this method requires that the long axis of the stomata is consistently parallel to the long axis of the bounding box, unlike in Fig. 6A. More advanced instance segmentation tools predict exact pixels defining the size and shape of pores (Fig. 6B), and stomatal complexes (Fig. 6C), eliminating the extrapolation of these measurements from bounding boxes or ellipse fits. Fewer than 12 papers reviewed collected pore or complex size data (Fig. 2D). However, this information is becoming more common and can be important in explaining variation in stomatal conductance (Xie et al., 2021).

Fig. 6.

Fig. 6.

Model output depending on machine learning analysis. (A) Broad bean stomatal detection and open/closed state classification (Li et al., 2023). (B) Black poplar stomatal pore segmentation (Song et al., 2020). (C) Sorghum stomatal complex segmentation (reprinted from Bheemanahalli et al., 2021, by permission of the American Society of Plant Biologist). (D) Maize stomatal complex and pavement cell instance segmentation (reprinted from Xie et al., 2021, by permission of the American Society of Plant Biologist).

Only one paper describes quantification of the area of a pair of guard cells to estimate maximum stomatal conductance (Gibbs et al., 2021), though another quantifies the volume of individual guard cells to understand the biomechanics of stomatal aperture (Davaasuren et al., 2022). Measurement of subsidiary cells alone has not yet been attempted to our knowledge. Measurement of these individual cell types would allow investigation of how cellular morphology is the product of developmental processes and influences stomatal function. Fewer than five papers extend the analysis to other epidermal cells that are not a part of the stomatal complex (Fig. 6D), enabling quantification of pavement cell density, size, and area (PCD, PCL, PCW, and PA). When combined with SD, a count of all other epidermal cells allows calculation of the stomatal index (SI; number of stomatal complexes divided by the total number of epidermal cells), as well as a more in-depth investigation of how changes in leaf development drive changes in stomatal patterning, though this trait is only quantified in two papers (Xie et al., 2021; Zhu et al., 2021). These traits in particular are those which are completely unrealistic to collect manually in significant number, and thus are great examples of the wealth of additional trait data available through ML image analysis.

Computer vision tools developed to assess stomatal patterning

The field of ML has advanced rapidly in the last decade, as has the application of ML tools to stomatal phenotyping (Supplementary Table S1). There is an exceptionally broad array of methods by which the analysis has been carried out, but no study has attempted the labor-intensive process of a broad intercomparison of methods. This means it is not possible to report a fair, quantitative comparison of model performance. However, we will now review the toolkit of computer vision approaches developed over recent years that has been applied to assessing images of stomata. We start with classical non-learning methods as historical context, before focusing on AI/ML models as a progression of technological advances. Object detection algorithms using DL will be broadly classified into two categories: two-stage (proposal) and one-stage (proposal-free) networks, based on how many times an input image is passed through the network. (Fig. 7).

Fig. 7.

Fig. 7.

Types of deep learning object detection algorithms. Categorization of common DL algorithms that appear in reviewed papers based on flow of image processing (modified from Viswanatha et al., 2022).

Classical non-learning computer vision approaches

Owing to the repeated structure of stomatal cells in the geometric structure, many classical computer vision approaches have been proposed by exploiting Fourier transforms (Brigham and Morrow, 1967) and capturing the repeatability property in the frequency domain. Watershed algorithms (Roerdink and Meijster, 2000; Duarte et al., 2017) have also been popularly used for segmenting individual stomatal cells from a leaf image, followed by morphological operators for tuning the boundaries of stomatal cells further (Sanyal et al., 2008; Aono et al., 2021). Another popular method of detecting stomatal cells has been using maximally stable extremal regions (MSER) (Liu et al., 2016; Zhang et al., 2022). This method is typically used for blob detection in images. Likewise, other methods based on Wavelet transforms (Duarte et al., 2017), skeletonization (Jayakody et al., 2017), Chan-Vese (Li et al., 2019; Liang et al., 2022), etc. have also been proposed. While these kinds of techniques were found to have some utility, ML/DL-based methods have been found to perform much better for a wide variety of computer vision tasks, including assessment of stomata.

AI/ML: two-stage (region proposal-based) networks

The two-stage object detection strategy consists of: (i) region proposal and (ii) region classification as the major pipeline. Some of the common two-stage object detection algorithms are detailed below.

Region-based convolutional neural networks

One of the major players in the field of two-stage networks is the region-based convolutional neural network (R-CNN) series, the earliest of which is simply called R-CNN (Girshick et al., 2014). It takes different regions of interests (ROIs) from an image and uses a CNN to classify if the object is present or not in that ROI. Using a selective search, it is possible to get just 2000 ROIs per image. This is called the region proposal network (RPN). The RPNs are followed by CNN layers which extract the data-driven visual features from each ROI and train a support vector machine classifier for the presence of an object in that region or not. This model has not been commonly applied to analysis of stomatal images, but it provided the foundation for further advances that have been applied to epidermal micrographs.

Fast region-based convolutional neural network

The second iteration in the R-CNN series is Fast-RCNN (Girshick, 2015). With computational efficiency being one of the main drawbacks of R-CNN, instead of feeding the RPNs to the CNNs, the entire image can be fed to the CNNs directly. This results in data-driven features which in turn help in identifying better ROIs. This makes the Fast R-CNN model much faster than the standard R-CNN. Because the majority of reviewed papers were published after 2020, it is more common to see subsequent iterations in the R-CNN series utilized.

Faster region-based neural network

The selective search process for the region proposals is slow and is a bottleneck to the overall efficiency of the model. Faster R-CNN eliminates the selective search block and instead lets the network itself learn the ROIs (Ren et al., 2017). The predicted region proposals are then reshaped using an ROI pooling layer which is then used to classify the image within the proposed region and to predict the offset values for the bounding boxes. Faster R-CNN possesses an extra CNN for gaining the regional proposal. This significantly speeds up the performance and makes the Faster R-CNN suitable for real-time deployment for inference. Reviewed models utilizing Faster R-CNN include Li et al. (2019), Liang et al. (2022), Yang et al. (2021), Zhang et al. (2022), and Zhu et al. (2021).

Mask region-based convolutional neural network

Mask R-CNNs (He et al., 2017) were built using the pipeline of Faster R-CNN, with the major difference that while Faster R-CNN has two outputs for each candidate object (class label and the bounding-box offset), Mask R-CNN adds a third that provides an object mask for each instance of an object. Owing to the higher computational efficiency and high accuracy, Mask R-CNNs were widely used for the inference of real-time object detection. It found applications in various domains, with detection of stomatal cells being one of them. Reviewed models utilizing Mask R-CNN include Bheemanahalli et al. (2021), Costa et al. (2021), Jayakody et al. (2021), Song et al. (2020), and Xie et al. (2021).

AI/ML: one-stage (proposal-free) networks

The one-stage object detection strategy consists of just the region classification as the main pipeline. Some of the common one-stage object detection algorithms are given below

You only look once

You only look once (YOLO) does not look at the complete image. Instead it looks at parts of the image which would have higher probabilities of containing the object (Redmon et al., 2016). The network in a single-shot predicts the class probabilities, bounding boxes of each of these ROIs, and their offsets. It is an end-to-end trained network and makes decisions all at once. This has proven popular for image analysis in general, leading to a sequence of iterative improvements in the method. Reviewed papers utilizing YOLO include Casado-García et al. (2019, 2020), Li et al. (2023), Sultana et al. (2021), Yang et al. (2021), and Zhang et al. (2022).

Single-shot MultiBox detector

As the name suggest, both the object detection and classification tasks at hand are done in a single feed-forward pass of the network. This is referred to as the single-shot. The MultiBox detector (Liu et al., 2015) is a regression technique for finding the bounding boxes from the image by minimizing the confidence (of the objectness of a region) loss and a location (how far the predicted ROI is from the actual ROI) loss. Together, they form the Single-shot MultiBox detector (SSD) object detector. One of the key advantages of this type of detector is prediction of objects of different scales and aspect ratios. This makes such detectors applicable on low-resolution images as well. Reviewed papers utilizing SSD include Kwong et al. (2021), Razzaq et al. (2021), Sakoda et al. (2019), and Yang et al. (2021).

Common challenges to ML-enabled analysis of stomatal patterning

Despite the successes achieved to date (Figs 2, 6; Supplementary Table S1), there remain challenges in the widespread adoption of AI/ML tools for stomatal phenotyping, and many of these challenges are innately present in all applications of ML models. The remainder of this review will describe these challenges, explain their significance relative to stomatal phenotyping, and make suggestions for optimally leveraging ML tools moving forward.

Ground-truth data

As previously stated, ML requires large amounts of ground-truth data. As an example, the most common task of stomatal detection for estimation of SD requires the manual drawing of bounding boxes around hundreds of stomatal images by human experts. Zhang et al. (2022) produced a dataset of 2150 ground-truth images of maize in which 23 360 stomata were manually annotated by bounding box to train their modified YOLO DL model. The time required to generate these data is not described in the paper but could conservatively be estimated to have taken >25 h of manual annotation time, not including the stated 5.65 h of computational time. The model did achieve an F1-score of 0.97 on 26 novel maize images, but the time invested may only pay off if the model can be accurately applied to other larger datasets in the future.

Studies have attempted to find the minimum amount of ground-truth training data required to achieve acceptable performance, but results are not widely generalizable because of variation in imaging strategy and SD by species. For example, Bheemanahalli et al. (2021) utilized datasets with 20–300 ground-truth images of sorghum for training, and found the greatest performance on the largest dataset. Sakoda et al. (2019) tested datasets from 25–200 ground-truth images of soybean, and found optimal performance at 175 images, and decreased performance at 200 images. The majority of papers reviewed here utilize datasets ranging from 100 to 1000 images for their training.

Additionally, the time required to generate ground-truth data increases disproportionately when attempting to collect more trait data. While SD remains an important trait, there is also evidence that the SCL, SCW, and the size of the pore are important drivers of stomatal conductance (Franks et al., 2009; Xie et al., 2021). However, while 33 papers calculate SD, only 12 estimate pore size and even fewer calculate other stomatal traits (Fig. 2D). One reason might be that generating ground-truth data for segmentation tasks such as the segmentation of pores (Fig. 6B) or the segmentation of stomata (Fig. 6C) requires human annotators to accurately outline each object at the pixel level, which takes dramatically more time than simple bounding boxes. Papers attempting to segment pore size commonly use training datasets consisting of hundreds of images like those models extracting SD, but require much more time to generate a ground-truth dataset of the same number of images. The time required for manual annotation increases exponentially when attempting to phenotype other epidermal cells in addition to stomata alone, but therefore so does the time saved when using a multi-trait automated phenotyping model (Fig. 6D).

While there is always a time cost in generating ground-truth data, this is done with the expectation that there will be an increased benefit in improved performance of the new model on future images, which is not guaranteed, and depends on the nature of the images added to training, and the images in the new test datasets. However, if the training and validation datasets contain adequate phenotypic diversity representative of the test dataset for the traits being quantified, the time to build the model can be minimized. Additionally, a model with a combined training and validation dataset of 80 images deployed on a modest dataset of 20 images may be faster than manual measurements of the multiple traits for the 100 images in total.

Context dependency

Context dependency is the phenomenon where an ML model performs well on data similar to those on which is has been trained, but struggles to perform on data that contain novel forms and features. A model’s context can be described by a combination of factors including, but not limited to, species, plant age, sample method, image magnification, and image quality. Some of these factors may prove more significant than others, and some can be minimized by diversifying the training dataset by introducing novel biological material or image augmentations.

Within a species there exists a range of SD, cell sizes, and cell shapes. Variability will expand further when surveying across species, and stomatal morphology can be drastically different in different plant functional groups. Many researchers aimed to create models that could transfer their learning from the trained species to others of similar overall stomatal morphology, with varying degrees of success. Bhugra et al. (2018) trained a model on >13 000 high resolution and high magnification SEM images of rice stomata, and the authors claim that it can be transferred to wheat, though they do not quantify this performance. Song et al. (2020) provide a powerful example of using transfer learning to address context dependency. Their initial model for stomatal detection and pore segmentation was trained on 750 black poplar images and achieved a precision of 96.87% and recall of 96.72% on a black poplar test dataset. This black poplar model was directly applied to balsam poplar and ginkgo images collected with the same sampling methodology, but showed poor performance, with precisions of 12.4% and 64.6%, and recall of 7.2% and 32.4% for balsam poplar and ginkgo, respectively. When the pre-trained model based on black poplar data was fine-tuned using 80 balsam poplar images or 73 ginkgo images, the resulting two species-specific models had dramatically higher precisions (76.5% and 84.7%) and recall (80% and 69%) for balsam poplar and ginkgo, respectively. With training datasets a 10th of the size of the original model, these models achieved relatively good performance on novel species. This provides one pathway for existing tools to be rapidly adapted to a wider diversity of subject species.

Anatomical differences often lead to a significant difference in performance by a model trained on a dicot and applied to a monocot, and vice versa. Li et al. (2023) reported a precision of 0.934 for stomatal detection from a YOLO model trained on broad bean, a dicot plant. Applying that same model to the monocot wheat shows reduced precision of 0.894. A better understanding of the limitations of ML tools has led other researchers to create individual models for monocots and dicots (Gibbs et al., 2021; Sai et al., 2023).

Other approaches attempt to create a more generalizable model. Casado-García et al. (2020) demonstrate the success of a model trained simultaneously on species with distinct stomatal morphologies, namely dumbbell guard cells of a monocot and the kidney bean guard cells of a dicot. Their combined model trained on common bean, barley, and soybean has an average F1-score of 0.93 when applied to novel images of all three species. There is a trade-off, however, as the combined model applied to a single species often produces more false positives than the model trained exclusively on that species. They also illustrate how poor performance can be when a model trained only on a monocot is presented with images from a dicot (barley model on soybean F1-score=0.05). However, when a model is exposed to diverse stomata in training, it can perform well across multiple species. However, it is crucial to consider that their combined model required the generation of ground-truth data for >3000 images containing anywhere from 15 to 100 stomata each. Creation of this model still required manual analysis of >150 000 stomata.

Context-dependent performance is also not exclusive to changes in species. Li et al. (2022) demonstrate context dependency relative to sampling method. Their model was originally trained on bright-field microscopy images, but does not perform well on confocal images. They use the bright-field model as a base and employ transfer learning to train a model for confocal images on a smaller ground-truth dataset than they would need for a model trained from scratch. This transfer-learning confocal model yields comparable high precision in calculating SD (97.4%) to the original bright-field model (96.3%), while it only required six additional ground-truth confocal images compared with the 140 ground-truth bright-field images. In this case, transfer learning helps eliminate context dependency due to the sampling method, which makes the model available to a wider pool of researchers.

Toda et al. (2018, Preprint) illustrate the impact of variation in stomatal size. Their wheat model for stomatal detection applied to Brachypodium has dismal performance, only appearing to identify two out of >40 stomata. If the performance disparity was the result of the application of the model on a novel species with stomata that are intrinsically different from those of wheat, we would not expect changes to the sampling procedure to improve performance. However, because the two species share monocot stomatal morphology, the most significant difference is cell size. Imaging the Brachypodium samples at a higher magnification such that the stomata are similar pixel dimensions to those of wheat resulted in comparable stomatal detection on Brachypodium. Additionally, Li et al. (2019) demonstrated that model error was correlated with the size of the structure being segmented. The accuracy of the pore segmentation is greater on stomata that are more open than on those that are more closed. If the stomata are >40% open, the segmentation error rate is ~4.8%, while stomata that are 10–20% open have an error rate of 9.5%. If a researcher’s focus was on accurate pore size measurements, it might be wise to image samples at a higher magnification. Context dependency can be overcome, not only by diversifying the training dataset, but also by making changes to the sampling methodology depending on the research goals.

Often performance can be improved without even introducing any additional biological diversity to the training dataset. Casado-García et al. (2019) had an original training dataset and validation dataset consisting of 131 total images. They created three other augmented datasets containing the original images, as well as versions of those images that had been flipped or blurred, attempting to make the model’s performance more resilient to variations in image orientation and quality. Their results show an increased precision in SD calculation in the augmented datasets. Because these augmentations can often improve performance and can utilize existing manual annotation data, this can be an easy way to increase the generalizability of a model.

Model fitting

A model’s fit describes how closely a model’s predictions align with the supplied ground-truth data. Poor model fitting is a common challenge in ML. A model is considered underfit when predictions align neither with the training dataset nor with the validation dataset ground-truth data. In the context of stomata, this may commonly occur when a model is given too few instances of stomata to be able to learn patterns, or training is performed across too few epochs, or if the network is not deep enough to capture the non-linearities in the data structure. Increasing the size of the training dataset such that the model has more instances to learn from will often resolve underfitting, but researchers obviously want to minimize the amount of tedious manual annotation required. A model is considered overfit when predictions are very similar to the training dataset ground-truth, but not to the validation data ground-truth. This may occur if training images are selected in a biased manner that make it in any way distinct from the validation data. Overfitting suggests that a model has learned to correctly identify the noise in a particular set of ground-truth training images, rather than learned the patterns for identifying stomata more generally. This is particularly common for DL because there are so many trainable parameters. Training a model for too many epochs, having too few training images, and noisy manual annotations are some of the common reasons for model overfitting. One effective way to detect overfitting is K-fold cross-validation, where the training data are divided into K equally sized subsets and use one of the subsets for validation and the remainder for testing in each iteration. This method is utilized in Aono et al. (2021), Davaasuren et al. (2022), Song et al. (2020), and Zhu et al. (2021). Overfitting can be prevented by certain methods such as data augmentation (applying transformations such as translation, flipping, and rotation to input images), batch normalization (i.e. normalizes the output of a previous layer in the CNN), dropout regularization (i.e. randomly deactivates a proportion of neurons in the CNN), and early stopping of training (when the loss minimization is lower than an epsilon).

Computational requirements

DL has gained significant popularity in various domains of scientific research due to its ability to learn data-driven features and the possibility of continuous improvement. However, this has come with a substantial demand for computing power. Thompson et al. (2020) did a survey on 1527 research papers and concluded that the computational requirement have escalated dramatically over time. This has been one of the major bottlenecks to research in resource-constrained labs.

Overparameterization

Having a greater number of model parameters than the number of available training instances often leads to inferior performance. This is called overparameterization. The computational complexity of training a deep NN scales with the number of model parameters. While the need for greater computing power this generates has resulted in the production of more powerful GPUs at lower cost, the quality of data and their acquisition sensors also improves (i.e. resulting in more training data points). Hence the need for a better computing system would always prevail for the future of DL-based models.

Image quality

Image quality can obviously have a significant impact on model performance. While humans are highly adaptable in their ability to view novel datasets and identify repeated features within them—such as patterns of stomata in the epidermis—the models studied in this review largely detect objects by the appearance of the object itself, without incorporating information about patterning. Jayakody et al. (2021) quantified the effect of image quality on model performance. They found that their model, trained on images of different species, imaging methods, and qualities, somewhat maintains precision as image quality decreases, but recall declines drastically. Fetter et al. (2019) demonstrate the same phenomenon, first by classifying images by their entropy, which quantifies the image’s contrast and noise. They found that reductions in image quality more frequently led to underestimating stomatal density.

Class imbalance and object heterogeneity

Data describing the patterning of non-stomatal cell types in the epidermis can be valuable for understanding leaf development as well as function. The phenotyping of other cell types, such as pavement cells, presents additional challenges beyond those encountered when only assessing stomatal traits. Attempting to train a model simultaneously on pavement cells and stomata faces a class imbalance problem, since pavement cells are five times more common than stomata. A model could experience overfitting for pavement cells and underfitting for stomata simultaneously because of this class imbalance. In addition, pavement cells usually have much more heterogeneity in shape and size than exist for stomata, thus a model must be more robust to be able to perform on many pavement cells.

While multiple techniques to address class imbalance are available (Dutta et al., 2020), few biologically focused studies have adopted them. Options for popular class imbalance handling methods that could be used include the following.

Re-balancing the dataset

This is used to balance out the data distribution across various classes by various re-sampling techniques. It is applied to the available data before feeding them into the network for training. The following are some of the most common methods.

  • (i) Naive undersampling: here we select only a part of the total samples from the majority class and discard the remaining samples. This definitely leads to a loss of a significant amount of valuable training data, at the cost of improving the performance of the model across various classes.

  • (ii) Selective decontamination (Barandela et al., 2003): this is also a similar undersampling technique to that above; however, samples are not selected at random. Instead, the Euclidean distance between the samples within a majority class is determined and only those samples whose k-nearest neighbor agree with each other completely are retained.

  • (iii) Naive oversampling (Zhang et al., 2018): in this method, the data of the minority classes are augmented (by flipping, rotation, cropping, resolution manipulation, etc.) to match the number of samples of the majority class.

  • v) GAN-based augmentation (Xian et al., 2018): this technique uses deep-generative models to generate additional ‘fake’ images of minority class samples or their features.

Cost-sensitive learning classifier

Here, the aim is to learn a better classifier by using an imbalanced dataset as it is—without any manipulation of data samples. The learning objective function is adjusted in these techniques to accommodate the class imbalance in the data. Common cost-sensitive loss functions include the following.

  • (i) Focal loss (Lin et al., 2017) and class-balanced focal loss (Cui et al., 2019): focal loss was proposed to compensate for foreground and background data imbalance. For example, in the context of stomata detection from leaf data, the stomatal cells occupy a very small part of the images. Hence, the number of pixels representing stomatal cells in an image is small. With a simple modification of the cross-entropy loss, the class-balanced version incorporates the number of samples into the loss function and assigns appropriate weights for the loss of each class, with the minor class getting higher weights.

  • (ii) Diversity regularizer (Hayat et al., 2019): this is a term that gets added to the overall learning objective and it helps in separating out both the majority and the minority class features in the latent space at an equal distance from each other. It has been found to provide significant improvements to the overall performance.

Phenotyping non-stomatal epidermal cells

As previously stated, only four of the reviewed papers even attempt to quantify non-stomatal epidermal cells in addition to the stomata. Two of the papers phenotype pavement cells by training an additional, separate model from that used for stomata. Zhu et al. (2021) employs one model for stomatal detection, and a separate model called U-Net for semantic segmentation of pavement cells in wheat. However semantic segmentation, as was previously described, only yields a prediction of the entire pavement cell network without distinguishing instances of individual cells. An additional step calculates the number of individual pavement cells, after which they calculate the SI. Saponaro et al. (2017) similarly has separate CNN models for stomatal detection and pavement cell semantic segmentation, with somewhat successful results (Supplementary Table S1).

Two published models simultaneously predicted stomatal and pavement cells. Xie et al. (2021) used Mask-RCNN trained on only 33 optical tomography images from a maize recombinant inbred line (RIL) population. However, this corresponded to a large number of cell instances because the pixel boundary of every stomatal and pavement cell was annotated. The performance of the model on the entire maize RIL dataset (>3200 images in each of 2 years) showed a strong correlation between manual and measured values of SD and pavement cell density (R2=0.974, R2=0.961). Liang et al. (2022) described another model, LeafNet, which was trained on bright-field images of Arabidopsis epidermal peels where stomatal and pavement cells were annotated. The results for SD (96.3% precision, 98.1% recall) and pavement cell segmentation quality (a metric incorporating TP and IoU, 88.6%) are similarly strong. In addition, transfer learning and retraining with only six images from an alternative confocal imaging modality produced comparable performance for SD (97.4% precision, 97.4% recall) and pavement cell segmentation quality (93.4%). LeafNet has also published a GUI for the benefit of researchers with less programming experience. Together, these two models illustrate the capacity of ML tools to simultaneously and accurately quantify stomatal and pavement cell traits across intraspecific diversity, in species with similar epidermal cell morphology, and across imaging modalities.

Future directions

Taken in their entirety, the reviewed papers suggest the following current best practices and future directions in the field of supervised ML applied to stomatal traits. Supplementary Table S1 compiles the key features of models from the reviewed papers, and is a good resource to find potential starting points in the development of a novel ML analysis tool suitable for a given combination of equipment, species, desired trait output, and programming experience. When building a new model, researchers should first ensure optimization of their sample collection and imaging methods to minimize technical variation in images, while also maintaining a methodology that is sufficiently high throughput. Subsequently, a modest dataset for ground-truth annotation should be assembled that contains the diversity with which the model will be presented regularly in its application. Ground-truth dataset assembly should seek to represent maximal sample diversity in minimal number, to reduce both manual annotation time and computational costs. It is best to begin with the fewest images possible that represent the observable range of the trait of interest, in order to reduce manual annotation time and computational costs. Subsequently, model performance can be re-tested with stepwise additions of more images until an acceptable performance is reached. It is important to note that the number of instances of stomata in the training data is more important than the number of images per se. So, tens of images each containing tens of stomata could be equivalent to hundreds of images each containing single or pairs of stomata.

New AI/ML methods

There are too many recent advances in fundamental AI/ML methods to list all the approaches that could be exploited for the purpose of phenotyping stomatal patterning. However, notable examples include active learning (Agarwal et al., 2020) to make manual annotation for computer vision more effective. Additionally, pre-trained foundation models, such as SAM (segment anything model) by Meta (Kirillov et al., 2023), frontload training across huge datasets available across the internet in the hope of avoiding the need for training on specific tasks such as object detection and segmentation of stomata. Successful cases (Ma et al., 2024) of leveraging foundation models applied to biological research via fine-tuning suggest the feasibility of this proposal.

Phenotyping epidermal patterning and stomatal conductance

Advances in understanding of stomatal structure and function suggest value in phenotyping and understanding more than SD. AI/ML tools clearly have the ability to detect and segment individual cells. Therefore, there is the opportunity for deeper insights if models output more trait data about the stomatal complex, and about cells of the surrounding epidermis (Xie et al., 2021; Li et al., 2022). Modeling that not only characterizes stomatal structures from images but also incorporates underlying functional signals (e.g. gas exchange, photosynthesis) to better understand the biological processes that make plants more efficient in resource use is underexplored (Gibbs et al. 2021). Graph NNs have been successfully used to model various biological processes, such as disease prediction and drug discovery (Zhang et al., 2021). Therefore, it is expected that they can contribute to a better understanding of these complex interactions, facilitating the identification of more efficient resource use in plants.

The dynamic behavior of stomata is crucial in regulating the flux of carbon and water through the soil–plant–atmosphere continuum and is essential for scaling leaf-level measurements. There have been limited advancements in integrating imaging systems with models that can capture the temporal response of stomata. NNs capable of extracting meaningful information and detecting changes from sequences of images (Weber et al., 2021) could significantly reduce this gap in the near future.

A universal model for phenotyping stomatal patterning?

The challenge of context dependency is noted in every paper reviewed here. The existing literature indicates that models can be created to perform reliably on significant genotypic and phenotypic diversity within a plant species or on multiple species within a plant functional group, provided the sampling and imaging methods are relatively constant. It remains to be seen whether something approaching a ‘universal’ model can be developed to operate effectively across diverse species, developmental stages, environments, and/or data acquisition methods. Success in that endeavor would be transformative because it would eliminate many current barriers to adoption by the wider biological research community. In the meantime, as researchers are producing additional models in more limited contexts (e.g. new individual species), they should take advantage of data augmentation to strengthen the training dataset and, where model performance falls off, should more frequently leverage transfer learning to minimize the additional time cost of generating a new model. Recent synthetic data generation techniques are yet to be leveraged in this space, but have been suggested to further supplement the training dataset (Gibbs et al., 2021).

Quantitative model intercomparison with a standard dataset

It is notable that an unbiased, comprehensive, and quantitative comparison of how well different AI/ML methods can be used to analyze epidermal micrographs and estimate stomatal patterning traits has not been done. That exercise is likely to be valuable and will benefit from the development of a standard dataset on which to compare algorithm performance, as has been done for other image analysis problems (David et al., 2020). However, which model will perform best may be very dependent on the species under study, as well as sampling and microscopy methods applied. So, a collaborative effort to define the standard dataset would be very important. Also, the pace of progress in development of new AI/ML methods is rapid, which may mean method comparisons struggle to keep pace with efforts to apply the latest analytical approaches.

Accessibility of ML tools and stomata as a model application of AI/ML

Finally, most models require some degree of competency in computer science, so fully developed models should consider the creation of a user-friendly interface to facilitate widespread model adoption. This indicates that further work is needed not only to improve model generalization ability but also to enhance web development efforts, making these tools more broadly available to researchers who are interested in using these models to answer biological questions.

While the above suggestions are being made relative to ML tools applied to the very specific problem of stomatal phenotyping, they undoubtedly reflect the trajectory of ML applications in other diverse fields. There is an inevitable lag between the development of novel techniques for ML and the implementation by non-experts in their respective fields, but we hope that this review helps to efficiently direct the integration of these techniques into models for epidermal phenotyping.

Supplementary data

The following supplementary data are available at JXB online.

Table S1. Summary of the papers reviewed.

erae395_suppl_Supplementary_Table_S1

Glossary

Abbreviations:

AI

artificial intelligence

CNN

convolutional neural network

FN

false negative

FP

false positive

DL

deep learning

ML

machine learning

R-CNN

region-based convolutional neural network

SCA

stomatal complex area

SCL

stomatal complex length

SCW

stomatal complex width

SD

stomatal density

SI

stomatal index

TN

true negative

TP

true positive

Contributor Information

Grace D Tan, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Program in Ecology, Evolution, and Conservation, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Ushasi Chaudhuri, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Sebastian Varela, Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Independent Researcher, Canelones, 15800, Uruguay.

Narendra Ahuja, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Andrew D B Leakey, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Center for Advanced Bioenergy and Bioproducts Innovation, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

Tracy Lawson, University of Essex, UK.

Conflict of interest

The authors have no conflicts of interest to declare.

Funding

This research was supported by the DOE Office of Science, Office of Biological and Environmental Research (BER), grant nos DE-SC0023160 and DE-SC0018277, as part of the DOE Center for Advanced Bioenergy and Bioproducts Innovation (U.S. Department of Energy, Office of Science, Biological and Environmental Research Program under award number DE-SC0018420), and by the Artificial Intelligence for Future Agricultural Resilience, Management and Sustainability Institute (Agriculture and Food Research Initiative (AFRI) grant no. 2020-67021-32799/project accession no. 1024178 from the USDA National Institute of Food and Agriculture).

References

  1. Ainsworth EA, Leakey ADB, Ort DR, Long SP.. 2008. FACE‐ing the facts: inconsistencies and interdependence among field, chamber and modeling studies of elevated [CO2] impacts on crop yield and food supply. New Phytologist 179, 5–9. [DOI] [PubMed] [Google Scholar]
  2. Andayani U, Sumantri IB, Pahala A, Muchtar MA.. 2020. The implementation of deep learning using convolutional neural network to classify based on stomata microscopic image of Curcuma herbal plants. IOP Conference Series: Materials Science and Engineering 851, 012035. [Google Scholar]
  3. Aono AH, Nagai JS, Dickel GDSM, Marinho RC, De Oliveira PEAM, Papa JP, Faria FA.. 2021. A stomata classification and detection system in microscope images of maize cultivars. PLoS One 16, e0258679. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Agarwal S, Arora H, Anand S, Arora C.. 2020. Contextual diversity for active learning. In: Vedaldi A, Bischof H, Brox T, Frahm JM, eds. Computer vision—ECCV 2020. Lecture Notes in Computer Science, vol. 12361. Cham: Springer, 137–153. [Google Scholar]
  5. Bai G, Jenkins S, Yuan W, Graef GL, Ge Y.. 2018. Field-based scoring of soybean iron deficiency chlorosis using RGB imaging and statistical learning. Frontiers in Plant Science 9, 1002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barandela R, Rangel E, Sánchez JS, Ferri FJ.. 2003. Restricted decontamination for the imbalanced training sample problem. In: Sanfeliu A, Ruiz-Shulcloper J, eds, Progress in pattern recognition, speech and image analysis. CIARP 2003. Lecture Notes in Computer Science, vol. 2905. Berlin Heidelberg: Springer, 424–431. [Google Scholar]
  7. Bheemanahalli R, Wang C, Bashir E, Chiluwal A, Pokharel M, Perumal R, Moghimi N, Ostmeyer T, Caragea D, Jagadish SVK.. 2021. Classical phenotyping and deep learning concur on genetic control of stomatal density and area in sorghum. Plant Physiology 186, 1562–1579. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bhugra S, Mishra D, Anupama A, Chaudhury S, Lall B, Chugh A.. 2018. Automatic quantification of stomata for high-throughput plant phenotyping. In: 24th International Conference on Pattern Recognition (ICPR). New York: IEEE, 3904–3910. [Google Scholar]
  9. Biscoe T. 1872. The breathing pores of leaves. The American Naturalist 6, 129–133. [Google Scholar]
  10. Braybrook SA, Kuhlemeier C.. 2010. How a plant builds leaves. The Plant Cell 22, 1006–1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Brigham EO, Morrow RE.. 1967. The fast Fourier transform. IEEE Spectrum 4, 63–70. [Google Scholar]
  12. Casado-García A, del-Canto A, Sanz-Saez A, et al. 2020. LabelStoma: a tool for stomata detection based on the YOLO algorithm. Computers and Electronics in Agriculture 178, 105751. [Google Scholar]
  13. Casado-García A, Domínguez C, García-Domínguez M, Heras J, Inés A, Mata E, Pascual V.. 2019. CLoDSA: a tool for augmentation in classification, localization, detection, semantic segmentation and instance segmentation tasks. BMC Bioinformatics 20, 323. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chaves MM, Maroco JP, Pereira JS.. 2003. Understanding plant responses to drought—from genes to the whole plant. Functional Plant Biology 30, 239. [DOI] [PubMed] [Google Scholar]
  15. Costa L, Archer L, Ampatzidis Y, Casteluci L, Caurin GAP, Albrecht U.. 2021. Determining leaf stomatal properties in citrus trees utilizing machine vision and artificial intelligence. Precision Agriculture 22, 1107–1119. [Google Scholar]
  16. Cui Y, Jia M, Lin TY, Song Y, Belongie S.. 2019. Class-balanced loss based on effective number of samples . In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE, 9268–9277. [Google Scholar]
  17. Davaasuren D, Chen Y, Jaafar L, Marshall R, Dunham AL, Anderson CT, Wang JZ.. 2022. Automated 3D segmentation of guard cells enables volumetric analysis of stomatal biomechanics. Patterns (New York, N.Y.) 3, 100627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. David E, Madec S, Sadeghi-Tehran P, et al. 2020. Global wheat head detection (GWHD) a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenomics 2020, 3521852. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dey B, Ahmed R, Ferdous J, Haque MMU, Khatun R, Hasan FE, Uddin SN.. 2023. Automated plant species identification from the stomata images using deep neural network: a study of selected mangrove and freshwater swamp forest tree species of Bangladesh. Ecological Informatics 75, 102128. [Google Scholar]
  20. Duarte KTN, Carvalho MAGD, Martins PS.. 2017. Segmenting high-quality digital images of stomata using the wavelet spot detection and the watershed transform. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Porto, Portugal, 540–547.
  21. Dutta T, Singh A, Biswas S.. 2020. Adaptive margin diversity regularizer for handling data imbalance in zero-shot SBIR. In: Vedaldi A, Bischof H, Brox T, Frahm J-M, eds. Computer vision—ECCV 2020. Lecture Notes in Computer Science, vol. 12350. Cham: Springer International Publishing, 349–364. [Google Scholar]
  22. Ellison EE, Nagalakshmi U, Gamo ME, Huang P, Dinesh-Kumar S, Voytas DF.. 2020. Multiplexed heritable gene editing using RNA viruses and mobile single guide RNAs. Nature Plants 6, 620–624. [DOI] [PubMed] [Google Scholar]
  23. Ferguson JN, Fernandes SB, Monier B, et al. 2021. Machine learning-enabled phenotyping for GWAS and TWAS of WUE traits in 869 field-grown sorghum accessions. Plant Physiology 187, 1481–1500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ferguson JN, Schmuker P, Dmitrieva A, et al. 2024. Reducing stomatal density by expression of a synthetic epidermal patterning factor increases leaf intrinsic water use efficiency and reduces plant water use in a C4 crop. Journal of Experimental Botany 75, doi: 10.1093/jxb/erae289 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Fetter KC, Eberhardt S, Barclay RS, Wing S, Keller SR.. 2019. StomataCounter: a neural network for automatic stomata identification and counting. New Phytologist 223, 1671–1681. [DOI] [PubMed] [Google Scholar]
  26. Franks PJ, Drake PL, Beerling DJ.. 2009. Plasticity in maximum stomatal conductance constrained by negative correlation between stomatal size and density: an analysis using Eucalyptus globulus. Plant, Cell & Environment 32, 1737–1748. [DOI] [PubMed] [Google Scholar]
  27. Franks PJ, Farquhar GD.. 2007. The mechanical diversity of stomata and its significance in gas-exchange control. Plant Physiology 143, 78–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Freeling M. 1992. A conceptual framework for maize leaf development. Developmental Biology 153, 44–58. [DOI] [PubMed] [Google Scholar]
  29. Gibbs JA, Mcausland L, Robles-Zazueta CA, Murchie EH, Burgess AJ.. 2021. A deep learning method for fully automatic stomatal morphometry and maximal conductance estimation. Frontiers in Plant Science 12, 780180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Girshick R. 2015. Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV). New York: IEEE, 1440–1448. [Google Scholar]
  31. Girshick R, Donahue J, Darrell T, Malik J.. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 580–587.
  32. Goodfellow I, Bengio Y, Courville A.. 2016. Deep learning. Cambridge, MA: The MIT Press. [Google Scholar]
  33. Grinblat GL, Uzal LC, Larese MG, Granitto PM.. 2016. Deep learning for plant identification using vein morphological patterns. Computers and Electronics in Agriculture 127, 418–424. [Google Scholar]
  34. Haus MJ, Kelsch RD, Jacobs TW.. 2015. Application of optical topometry to analysis of the plant epidermis. Plant Physiology 169, 946–959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Hayat M, Khan S, Zamir SW, Shen J, Shao L.. 2019. Gaussian affinity for max-margin class imbalanced learning. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 6468–6478. [Google Scholar]
  36. He K, Gkioxari G, Dollar P, Girshick R.. 2017. Mask R-CNN. In: 2017 IEEE International Conference on Computer Vision (ICCV). New York: IEEE, 2980–2988. [Google Scholar]
  37. Hetherington AM, Woodward FI.. 2003. The role of stomata in sensing and driving environmental change. Nature 424, 901–908. [DOI] [PubMed] [Google Scholar]
  38. Jayakody H, Liu S, Whitty M, Petrie P.. 2017. Microscope image based fully automated stomata detection and pore measurement method for grapevines. Plant Methods 13, 94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Jayakody H, Petrie P, de Boer HJ, Whitty M.. 2021. A generalised approach for high-throughput instance segmentation of stomata in microscope images. Plant Methods 17, 27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Karabourniotis G. 2001. Epicuticular phenolics over guard cells: exploitation for in situ stomatal counting by fluorescence microscopy and combined image analysis. Annals of Botany 87, 631–639. [Google Scholar]
  41. Kirillov A, Mintun E, Ravi N, et al. 2023. Segment anything. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE, 3992–4003. [Google Scholar]
  42. Kwong QB, Wong YC, Lee PL, Sahaini MS, Kon YT, Kulaveerasingam H, Appleton DR.. 2021. Automated stomata detection in oil palm with convolutional neural network. Scientific Reports 11, 15210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Laga H, Shahinnia F, Fleury D.. 2014. Image-based plant stomata phenotyping. In: 2014 13th International Conference on Control Automation Robotics & Vision (ICARCV). New York: IEEE, 217–222. [Google Scholar]
  44. Leakey ADB, Ferguson JN, Pignon CP, Wu A, Jin Z, Hammer GL, Lobell DB.. 2019. Water use efficiency as a constraint and target for improving the resilience and productivity of C3 and C4 crops. Annual Review of Plant Biology 70, 781–808. [DOI] [PubMed] [Google Scholar]
  45. Lee S, Kim H, Ishikawa M, Higuchi H.. 2019. 3D nanoscale tracking data analysis for intracellular organelle movement using machine learning approach. In: 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC). New York: IEEE, 181–184. [Google Scholar]
  46. Li K, Huang J, Song W, Wang J, Lv S, Wang X.. 2019. Automatic segmentation and measurement methods of living stomata of plants based on the CV model. Plant Methods 15, 67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Li S, Li L, Fan W, Ma S, Zhang C, Kim JC, Wang K, Russinova E, Zhu Y, Zhou Y.. 2022. LeafNet: a tool for segmenting and quantifying stomata and pavement cells. The Plant Cell 34, 1171–1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Li X, Guo S, Gong L, Lan Y.. 2023. An automatic plant leaf stoma detection method based on YOLOv5. IET Image Processing 17, 67–76. [Google Scholar]
  49. Liang X, Xu X, Wang Z, et al. 2022. StomataScorer: a portable and high‐throughput leaf stomata trait scorer combined with deep learning and an improved CV model. Plant Biotechnology Journal 20, 577–591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Lin T-Y, Goyal P, Girshick R, He K, Dollár P.. 2017. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 318–327. [DOI] [PubMed] [Google Scholar]
  51. Liu S, Tang J, Petrie P, Whitty M.. 2016. A fast method to measure stomatal aperture by MSER on smart mobile phone. In: Imaging and Applied Optics 2016, AIW2B.2.
  52. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC.. 2015. SSD: single shot multibox detector. In: Computer Vision—ECCV 2016. Lecture Notes in Computer Science, vol. 9905. Cham: Springer, 21–37. [Google Scholar]
  53. Lunn D, Kannan B, Germon A, Leverett A, Clemente TE, Altpeter F, Leakey ADB, Lunn J.. 2024. Greater aperture counteracts effects of reduced stomatal density on water use efficiency: a case study on sugarcane and meta-analysis. Journal of Experimental Botany 75, 6837–6849. doi: 10.1093/jxb/erae271. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Ma J, He Y, Li F, Han L, You C, Wang B.. 2024. Segment anything in medical images. Nature Communications 15, 654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Meeus S, Van Den Bulcke J, Wyffels F.. 2020. From leaf to label: a robust automated workflow for stomata detection. Ecology and Evolution 10, 9178–9191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Melotto M, Underwood W, He SY.. 2008. Role of stomata in plant innate immunity and foliar bacterial diseases. Annual Review of Phytopathology 46, 101–122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Mohanty SP, Hughes DP, Salathé M.. 2016. Using deep learning for image-based plant disease detection. Frontiers in Plant Science 7, 1419. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Ott T, Lautenschlager U.. 2022. GinJinn2: object detection and segmentation for ecology and evolution. Methods in Ecology and Evolution 13, 603–610. [Google Scholar]
  59. Pillitteri LJ, Torii KU.. 2012. Mechanisms of stomatal development. Annual Review of Plant Biology 63, 591–614. [DOI] [PubMed] [Google Scholar]
  60. Prakash PT, Banan D, Paul RE, Feldman MJ, Xie D, Freyfogle L, Baxter I, Leakey ADB.. 2021. Correlation and co-localization of QTL for stomatal density, canopy temperature, and productivity with and without drought stress in Setaria. Journal of Experimental Botany 72, 5024–5037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Razzaq A, Shahid S, Akram M, et al. 2021. Stomatal state identification and classification in quinoa microscopic imprints through deep learning. Complexity 2021, 1–9. [Google Scholar]
  62. Redmon J, Divvala S, Girshick R, Farhadi A.. 2016. You only look once: unified, real-time object detection. In: Proceedings, 29th IEEE Conference on Computer Vision and Pattern Recognition. New York: IEEE, 779–788. [Google Scholar]
  63. Ren S, He K, Girshick R, Sun J.. 2017. Faster R-CNN: towards real-time object detection with region proposal networks . IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1137–1149. [DOI] [PubMed] [Google Scholar]
  64. Roerdink JBTM, Meijster A.. 2000. The watershed transform: definitions, algorithms and parallelization strategies. Fundamenta Informaticae 41, 187–228. [Google Scholar]
  65. Sai N, Bockman JP, Chen H, Watson‐Haigh N, Xu B, Feng X, Piechatzek A, Shen C, Gilliham M.. 2023. STOMAAI: an efficient and user‐friendly tool for measurement of stomatal pores and density using deep computer vision. New Phytologist 238, 904–915. [DOI] [PubMed] [Google Scholar]
  66. Sakoda K, Watanabe T, Sukemura S, Kobayashi S, Nagasaki Y, Tanaka Y, Shiraiwa T.. 2019. Genetic diversity in stomatal density among soybeans elucidated using high-throughput technique based on an algorithm for object detection. Scientific Reports 9, 7610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Sanyal P, Bhattacharya U, Bandyopadhyay SK.. 2008. Analysis of SEM images of stomata of different tomato cultivars based on morphological features. In: 2008 Second Asia International Conference on Modelling & Simulation (AMS), 890–894.
  68. Saponaro P, Treible W, Kolagunda A, Chaya T, Caplan J, Kambhamettu C, Wisser R.. 2017. DeepXScope: segmenting microscopy images with a deep neural network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). New York: IEEE, 843–850. [Google Scholar]
  69. Shalev-Shwartz S, Ben-David S.. 2014. Understanding machine learning: from theory to algorithms. Cambridge, UK: Cambridge University Press. [Google Scholar]
  70. Song G, Wang Q.. 2023. Species classification from hyperspectral leaf information using machine learning approaches. Ecological Informatics 76, 102141. [Google Scholar]
  71. Song W, Li J, Li K, Chen J, Huang J.. 2020. An automatic method for stomatal pore detection and measurement in microscope images of plant leaf based on a convolutional neural network model. Forests 11, 954. [Google Scholar]
  72. Stringer C, Wang T, Michaelos M, Pachitariu M.. 2021. Cellpose: a generalist algorithm for cellular segmentation. Nature Methods 18, 100–106. [DOI] [PubMed] [Google Scholar]
  73. Sultana SN, Park H, Choi SH, Jo H, Song JT, Lee J-D, Kang YJ.. 2021. Optimizing the experimental method for stomata-profiling automation of soybean leaves based on deep learning. Plants (Basel, Switzerland) 10, 2714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Thompson NC, Greenewald K, Lee K, Manso GF.. 2020. The computational limits of deep learning. MIT initiative on the digital economy research brief, vol. 4. Cambridge, MA: MIT Press. [Google Scholar]
  75. Toda Y, Toh S, Bourdais G, Robatzek S, Maclean D, Kinoshita T.. 2018. DeepStomata: facial recognition technology for automated stomatal aperture measurement. bioRxiv doi: 10.1101/365098 [Preprint]. [DOI] [Google Scholar]
  76. Viswanatha V, Chandana RK, Ramachandra AC.. 2022. IoT based smart mirror using raspberry Pi 4 and YOLO algorithm: a novel framework for interactive display. Indian Journal of Science and Technology 15, 2011–2020. [Google Scholar]
  77. Vőfély RV, Gallagher J, Pisano GD, Bartlett M, Braybrook SA.. 2019. Of puzzles and pavements: a quantitative exploration of leaf epidermal cell shape. New Phytologist 221, 540–552. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Wang H, Fu T, Du Y, et al. 2023. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60. [DOI] [PubMed] [Google Scholar]
  79. Weber M, Wald T, Zollner JM.. 2021. Temporal feature networks for CNN based object detection. In: 2021 IEEE Intelligent Vehicles Symposium (IV). New York: IEEE, 1478–1484. [Google Scholar]
  80. Willmer C, Fricker M.. 1996. Stomata, 2nd edn. Dordrecht: Springer. [Google Scholar]
  81. Xian Y, Lorenz T, Schiele B, Akata Z.. 2018. Feature generating networks for zero-shot learning. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE, 5542–5551. [Google Scholar]
  82. Xie J, Fernandes SB, Mayfield-Jones D, Erice G, Choi M, E Lipka A, Leakey ADB.. 2021. Optical topometry and machine learning to rapidly phenotype stomatal patterning traits for maize QTL mapping. Plant Physiology 187, 1462–1480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  83. Yang X, Xi Z, Li J, Feng X, Zhu X, Guo S, Song C.. 2021. Deep transfer learning-based multi-object detection for plant stomata phenotypic traits intelligent recognition. IEEE/ACM Transactions on Computational Biology and Bioinformatics 20, 321–329. [DOI] [PubMed] [Google Scholar]
  84. Zhang F, Ren F, Li J, Zhang X.. 2022. Automatic stomata recognition and measurement based on improved YOLO deep learning model and entropy rate superpixel algorithm. Ecological Informatics 68, 101521. [Google Scholar]
  85. Zhang J, Shen F, Liu L, Zhu F, Yu M, Shao L, Shen HT, Van Gool L.. 2018. Generative domain-migration hashing for sketch-to-image retrieval. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer vision–ECCV 2018. Lecture Notes in Computer Science, vol. 11206. Cham: Springer, 304–321. [Google Scholar]
  86. Zhang X-M, Liang L, Liu L, Tang M-J.. 2021. Graph neural networks and their current applications in bioinformatics. Frontiers in Genetics 12, 690049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Zhou J, Fu X, Zhou S, Zhou J, Ye H, Nguyen HT.. 2019. Automated segmentation of soybean plants from 3D point cloud using machine learning. Computers and Electronics in Agriculture 162, 143–153. [Google Scholar]
  88. Zhu C, Hu Y, Mao H, Li S, Li F, Zhao C, Luo L, Liu W, Yuan X.. 2021. A deep learning-based method for automatic assessment of stomatal index in wheat microscopic images of leaf epidermis. Frontiers in Plant Science 12, 716784. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

erae395_suppl_Supplementary_Table_S1

Articles from Journal of Experimental Botany are provided here courtesy of Oxford University Press

RESOURCES