Abstract
Background
Automated segmentation of large amount of image data is one of the major bottlenecks in high-throughput plant phenotyping. Dynamic optical appearance of developing plants, inhomogeneous scene illumination, shadows and reflections in plant and background regions complicate automated segmentation of unimodal plant images. To overcome the problem of ambiguous color information in unimodal data, images of different modalities can be combined to a virtual multispectral cube. However, due to motion artefacts caused by the relocation of plants between photochambers the alignment of multimodal images is often compromised by blurring artifacts.
Results
Here, we present an approach to automated segmentation of greenhouse plant images which is based on co-registration of fluorescence (FLU) and of visible light (VIS) camera images followed by subsequent separation of plant and marginal background regions using different species- and camera view-tailored classification models. Our experimental results including a direct comparison with manually segmented ground truth data show that images of different plant types acquired at different developmental stages from different camera views can be automatically segmented with the average accuracy of () using our two-step registration-classification approach.
Conclusion
Automated segmentation of arbitrary greenhouse images exhibiting highly variable optical plant and background appearance represents a challenging task to data classification techniques that rely on detection of invariances. To overcome the limitation of unimodal image analysis, a two-step registration-classification approach to combined analysis of fluorescent and visible light images was developed. Our experimental results show that this algorithmic approach enables accurate segmentation of different FLU/VIS plant images suitable for application in fully automated high-throughput manner.
Keywords: Greenhouse plant phenotyping, Visible light imaging, Fluorescence imaging, Multimodal image alignment, Supervised image segmentation, Machine learning
Background
In the last two decades, high-throughput greenhouse phenotyping became the method of choice for quantitative assessment of plant morphology, development and function. High-throughput screening platforms such as LemnaTec-Scanalyzer3D (LemnaTec GmbH, Aachen, Germany) enable, depending on the configuration, the acquisition of thousands of fluorescence (FLU), visible light (VIS), near-infrared (NIR) images that have to be processed and analyzed in an automated manner. The first essential step of plant image analysis, which determines the quality of all subsequently derived phenotypic traits, consists of robust and accurate segmentation (i.e. spatial localization) of plant structures. A straightforward segmentation of optically heterogeneous and noisy greenhouse images is, however, hampered by a combination of several natural and technical factors, including variable optical appearance of developing plants, inhomogeneous scene illumination, occlusions, shadows and reflections in for- and background image regions, see Fig. 1. Consequently, same or similar colors may occur in plant and non-plant image regions, which makes application of simple color-thresholding techniques improper. The principal difficulty of accurate segmentation of optically complex and dynamic greenhouse images was identified as the major bottleneck of high-throughput plant phenotyping [1].
State of the art approaches to segmentation of plant images include
Application of saliency approaches based on certain assumptions about image structure [2, 3], for example, that majority of image pixels belong to plant-free background region,
Construction of color-distance maps followed by their subsequent thresholding or clustering [4, 5],
Application of supervised and unsupervised classification and machine learning models [6, 7],
Co-registration of different image modalities, e.g., visible light (VIS) and infrared (IR) images [8], high-contrast fluorescence (FLU) and low-contrast visible light (VIS) or near-infrared (NIR) images [9].
Unfortunately, the prerequisites for saliency approaches is not always given. Sometimes, plant structures overgrow the optical field so that majority of pixels cannot be considered as background. Efficient and straightforward in algorithmic implementation color-distance methods become less reliable in presence of shadows and illumination changes. In such cases, reference images (i.e. background illumination without any plants) may substantially deviate from the background regions of plant-containing images. Especially, adult plants with large and/or many leaves throw large shadows that alter original colors and intensity distribution of background regions and low-lying leaves.
Supervised machine and, in particular, deep learning techniques are nowadays successfully applied for plant image processing and analysis [10]. However, optical appearance of diverse plant types under different experimental conditions exhibits large variability which requires substantial efforts for generation of reliable ground truth data. Especially, advanced deep learning methods are known to require a large amount of representative, manually annotated images that may reasonably be generated for a one or few simple model species with the help of unskilled contributors [11], but can hardly be extended to many other crop plant species imaged in different camera views, with different camera modalities, at different developmental stages. Promising statistical approaches to unsupervised segmentation of VIS plant images using were presented in [12, 13]. Further investigation are, however, required to assess their robustness and efficiency by application to large amount of heterogeneous greenhouse plant images.
To overcome the above limitations of unimodal image analysis, combination of images of different modalities, for example, high-contrast FLU and low-contrast VIS images, was suggested in our previous works [9]. Once aligned, the binary mask of segmented FLU image can be applied for extraction of target plant regions in structurally more complex and difficult VIS images. However, due to unavoidable inertial motion of plant leaves by relocation of plants from one photochamber to another, images of different modalities may exhibit relative non-uniform motion which leads to locally inexact co-registration and inclusion of marginal background regions, see Fig. 2. In this work, we present a two-step algorithmic approach which combines multimodal image registration with subsequent detection and elimination of marginal background regions using supervised classification models of optical plant and background appearance in extended color spaces. Our experimental results show that combination of spatial and color information from multimodal image co-registration and color classification enables an accurate and robust segmentation of plant images in context of high-throughput greenhouse plant phenotyping. Precompiled executables of our multimodal plant image registration-classification-segmentation pipeline suitable for a straightforward command-line script application accompany this work.
Methods
Image data acquisition and pre-processing
Visible light (VIS) and fluorescence (FLU) top-/side-view images of developing arabidopsis, wheat and maize shoots were acquired from high-throughput measurements over more than two weeks using three different LemnaTec-Scanalyzer3D platforms for high-throughput phenotypic of small (e.g., arabidopsis), mid-size (e.g., wheat), and large (e.g., maize) plants (LemnaTec GmbH, Aachen, Germany), see Table 1. For a detailed specification of VIS/FLU camera sensors and filters we refer to our previous publication [14]. For quantification of accuracy of image segmentation algorithms, images were segmented manually using our in-house kmSeg tool, which relies on efficient annotation of automatically pre-segmented image regions to plant or non-plant binary categories using k-means clustering of Eigen-colors [15].
Table 1.
Plants/views | # plants | # days | # angles | # FLU/VIS pairs | VIS size | FLU size |
---|---|---|---|---|---|---|
Arab.T./top | 4 | 20 | 1 | 80 | 2056 × 2454 | 1234 × 1624 |
Wheat/side | 4 | 47 | 3 | 564 | 1234 × 1624 | 1234 × 1624 |
Maize/side | 6 | 22 | 4 | 526 | 2056 × 2454 | 1038 × 1390 |
Distance-based pre-segmentation
Our previous investigations have shown that multimodal image registration is sensitive to structural differences between images such as background image gradient, shadows and reflections [9]. In order to improve robustness of multimodal image co-registration, VIS and FLU images are automatically pre-segmented using the following basic steps:
Computation of the Euclidean distance in the RGB color space between the reference (empty background) and the plant containing image,
Clustering of the distance image into a predefined number (N) of clusters using the fast equidistant k-means algorithm (in this work N = 25 was used to separate plant from noisy background regions),
Calculation of z-scores between color distributions of background and plant-containing images for all N k-means clusters,
Selection of k-means clusters with z-score values of plant-background color distributions exceeding a certain threshold value (in this work the z-score threshold was used).
Pre-segmentation performed using this approach enables elimination of background regions that would otherwise irritate registration algorithms. As a result of pre-segmentation one obtains a pair of almost ideally segmented FLU and roughly cleaned VIS images that exhibit structural features (e.g., shape contours) required for detection of FLU/VIS images similarities and their automated alignment, see Fig. 3.
FLU/VIS image co-registration
Pre-segmented FLU and VIS images are automatically aligned using the iterative image co-registration scheme as described in [9]. The transformation matrix obtained from registration of pre-segmented FLU/VIS images is then used to mask the plant regions of VIS image corresponding to the automatically segmented and registered FLU binary mask, see Fig. 3.
Transformation of RGB images to Eigen-color space
FLU and VIS images are transformed from RGB to HSV (3D), Lab (3D) and CMYK (4D) color spaces and subsequently merged to a 10 dimensional (i.e. ) color space representation. To improve topological separability of color clusters, principal component analysis (PCA) of 10 dimensional color space is performed to obtain ’Eigen-color’ image representation, see Fig. 4.
Data reduction using k-means clustering
Pixel-wise description of plant and background image structures leads to extremely large data, e.g., a 2592x3872 RGB image has data points, which can hardly be handled by conventional machine learning approaches. To reduce the amount of data, pre-segmented VIS and FLU images were subdivided into a small number of regions () using k-means clustering of pixel colors. Consequently, plant and background image regions were compactly described by average colors of N k-means regions, or shortly AC-KMR, see Fig. 3. Thereby, the number of AC-KMR depends on variability of colors in image data. Juvenile homogeneously colored plants photographed against a uniform background require less AC-KMR than more color-rich adult plants and/or noisy background. By segmentation of images using plant type, age and camera view specific models, the same number of AC-KMR as defined in the model is used.
Plant/background color region classification
Binary classification of background and plant regions is performed using average colors of N k-means regions (AC-KMR). From our experience, none of conventional classifiers showed exceptional performance throughout all plant species and age categories (as later discussed in details). Consequently, eight alternative classification models were trained to automatically separate AC-KMR using manually segmented and annotated data. Table 2 gives on overview of the eight binary models and the corresponding MATLAB (MathWorks, Inc.) functions that were used for supervised training and classification of plant and background regions. In the case of regression models (such as glm, grp, linmod, svmreg) that provide a ’fuzzy estimate’ () for category association, assignment to either plant (1) or background (0) category is performed using the fixed threshold 0.5, i.e. if then 1, if then 0. In addition to predictions of eight distinctive classifiers, two additional segmentation results including median and fusion (i.e. logical OR) images of all eight classification models are computed, see Fig. 5.
Table 2.
# | Classification model | Acronym | MATLAB function |
---|---|---|---|
1 | Naive Bayes model | bayes | fitcnb(X,Y) |
2 | Discriminant analysis | da | fitcdiscr(X,Y) |
3 | Generalized linear regression | glm | fitglm(X,Y) |
4 | Gaussian process regression | gpr | fitrgp(X,Y) |
5 | Linear model regression | linmod | fitlm(X,Y) |
6 | Binary support vector machine | svm | fitcsvm(X,Y) |
7 | Support vector machine regression | svm | fitrsvm(X,Y) |
8 | Neural network model | net | train(net,X,Y) |
(e.g., patternnet with N hidden layers) | net=patternnet(N) |
Small object removal
After application of the above segmentation steps images still may contain small artefacts (typically solitary objects) that have to be removed in order to avoid potential errors by subsequent calculation of phenotypic descriptors. Such small artefacts do not significantly affect the projection area, but can distort linear dimensions of segmented regions such as plant width, height, bounding box, convex hull, etc. Since growing plants exhibit different projection areas, one cannot rely on a fixed size threshold to remove such small artefacts,—separately segmented leaf tips of small plant shoots can be of the same size as small background structures. Consequently, a set of classification models based on eight classifiers from Table 2 was trained to detect small background (i.e. non-plant) structures on the basis of the object’s color ratios (R/B, R/G, G/B), size and vicinity to the largest image structure assessed by the Euclidean distance map using the bwdist MATLAB function. For decision making, the median values of object labels (i.e. plant or non-plant category) predicted by eight classifiers were used.
Model evaluation measures
Accuracy of automated image segmentation was evaluated in terms of confusion matrix [TP FP; FN FP] (TP—true positive, FP—false positive, FN - false negative, TN - true negative) calculated on the basis of manually annotated and algorithmically predicted binary classification of k-means color-regions to either plant or background categories, and the overall model accuracy
1 |
which is closely related to other conventional measures, e.g., the Dice similarity coefficient (DSC). However, the accuracy of confusion matrix is more sensitive to failures in region classification than whole binary mask comparison using DSC.
Model training and evaluation scenarios
Binary classification models for plant/background separation were trained and evaluated using several different scenarios. In particular, all eight classification models listed in Table 2 were separately trained for three different plant types (arabidopsis, wheat and maize), two different camera views (top/side view), three image modalities (FLU, VIS and FLU+VIS) and different plant developmental stages including: I—juvenile/small, II—mid-stage and III—adult/large shoots, as well as their combinations, i.e. I + II, I + III, II + III, I + II + III, resulting in totally 288 case-scenario models. The reason for such multiple model training is that optical appearance of plants and background regions significantly varies depending on screening facility, camera views, plant type and developmental stage. Consequently, it is not a-priori clear which of mostly linear models would be capable of accurately separating such highly variable and heterogeneous data. For evaluation of performance, trained models were tested on the same (training) data set as well as three new samples corresponding to juvenile, mid-stage and adult plants.
Experimental results
FLU and VIS images of a totall of 80 arabidopsis, 526 maize and 564 wheat plants at different developmental stages were semi-automatically segmented using the kmSeg tool [15] into two (i.e. plant or background) categories as described above.
The registration-classification pipeline was applied to segment FLU/VIS image pairs of arabidopsis, wheat and maize images stepwise including (i) pre-segmentation of FLU/VIS images, (ii) automated co-registration of pre-segmented FLU/VIS images and (iii) classification of plant and non-plant structures in VIS image regions that were masked by the binary mask of the registered FLU image. As a consequence of different spatial resolutions and/or non-uniform leaf motion by relocation of plant from one photochamber to another, FLU/VIS alignment is not exact which manifests in inclusion of marginal background pixels, see Fig. 2. To remove marginal background regions in VIS images eight distinctive color models trained on all 288 case-scenarios including arabidopsis, wheat and maize plant/background appearance in different camera views and developmental stages were applied. Performance of all 288 models were evaluated in terms of confusion matrix between ground truth and predicted plant/background image regions by application to (i) the same (training) data set as well as three test samples corresponding to (ii) juvenile, (iii) mid- and (iv) adult stages of plant development. The results of all tests including the confusion matrices and accuracy values can be found in Additional file 1: Table S1. A brief summary of the model performance is shown in Fig. 6. As one can see in Fig. 6a, all eight classification models are capable to reproduce the same training data they were trained to with an average accuracy of more than . Fig. 6b shows the distribution of accuracy crossover all 288 classification models, which exhibits following cumulative statistics:
The best and worst performers by reproduction of the training data are net and svmreg, respectively. The outperformance of the non-linear neural net over other linear models by the self-reproduction test is not surprising. However, by application to other test samples the net model does not appear to be advantageous in comparison to linear models, see Fig. 6c. Furthermore, it is evident that all models that were trained with image data of adult plants (III) show significantly poorer performance by application to juvenile plant species (I). This fact can be traced back to significant differences between color signatures of juvenile (typically light green) and adult (rather dark green and sometimes even yellow and/or red) plant leaves. With exception of svmreg, most classification models show the best performance crossover plant species of different developmental stages when they were trained on mixed datasets combining juvenile, mid-stage and adult plants (I + II + III).
The whole pipeline of registration-classification based segmentation (RCS) of multimodal greenhouse plant images is provided as an executable command-line tool
rcs.exe <input imgages> <output images> <class model> <opts>
suitable for script integration and high-context image processing from our homepage https://ag-ba.ipk-gatersleben.de/rcs.html. For the given pair of unregistered and unsegmented fluorescent and visible light images as well as reference FLU/VIS images, the RCS tool performs automated registration-classification based segmentation and writes out registered and segmented FLU/VIS images as well as further optional files in output. If users do not have reference images, they can generate them on their own under consideration of a typical background color in the plant containing images, for example,—black FLU and light gray VIS reference images of the same size as plant containing FLU/VIS images. A detailed description the RCS tool can be found in the user guide available with the above file repository.
Discussion
Segmentation of a large amount of multimodal image data from greenhouse phenotyping experiments is the first challenging step of any image analysis pipeline aiming at quantitative plant phenotyping. However, straightforward segmentation of some image modalities including wide-spread visible light images is hampered by a number of natural and technical factors including inhomogeneous illumination of photochambers, dynamic optical appearance of developing plants, shadows, reflections and occlusion in plant and background regions. To overcome the limitations of unimodal image analysis, an approach to plant image segmentation based on multimodal image registration followed by classification of plant and marginal background regions was developed. Our experimental results using eight conventional classifiers and totally 288 case-scenario models considering different camera views, plant types and developmental stages demonstrate that plant segmentation with the average accuracy of (SD=) crossover all tested models can be achieved. More accurate segmentation can be performed using suitable case-scenario models including FLU, VIS as well as combined FLU/VIS based classification of plant and marginal background regions. Furthermore, our evaluation studies show that classifiers trained on a mixed image data including different plant developmental stages and optical appearance outperform classification models tailored to a narrow plant phenotype. Despite a broad spectrum of optical case-scenarios our classification models are based on optical setups of our particular three screening platforms that exhibit light background regions in contrast to darker plant structures. In case of strongly deviating optical conditions and/or plant appearance retraining of classification models should be taken into consideration.
Conclusion
Highly variable optical appearance of different plant and background structures makes segmentation of greenhouse images a non-trivial task. To overcome shortcomings of unimodal image analysis, here we suggest a two-step registration-classification approach which reduces complexity of whole image segmentation to classification of pre-segmented fluorescent and visible light plant and marginal background image regions. Our experimental results demonstrate that this approach enables segmentation of different plant types in different developmental stages from different camera view with sufficiently high accuracy suitable for application in fully automated high-throughput manner. A command-line tool provided with this work enables quantitative plant researchers to efficiently integrate our registration-classification based image segmentation algorithms in custom image processing pipelines.
Supplementary information
Acknowledgements
We would like to thank Mohammad-Reza Hajirezaei from the Molecular Plant Nutrition group of the IPK Gatersleben for kindly providing the image data of arabidopsis growth experiment.
Authors’ contributions
MH, EG conceived, designed and performed the computational experiments, analyzed the data, wrote the paper, prepared figures and tables, reviewed drafts of the paper. AJ, KN executed the laboratory experiments, acquired image data, co-wrote the paper, reviewed drafts of the paper. TA co-conceptualized the project, reviewed drafts of the paper. All authors read and approved the final manuscript.
Funding
This work was performed within the German Plant-Phenotyping Network (DPPN) which is funded by the German Federal Ministry of Education and Research (BMBF) (project identification number: 031A053). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Availability of data and materials
Examples of data used and analyzed in this study are provided in supplementary materials. Further datasets are available from the corresponding author on request.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Consent and approval for publication from all the authors was obtained.
Competing interests
The authors declare that they have no competing interests.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary information accompanies this paper at 10.1186/s13007-020-00637-x.
References
- 1.Minervini M, Scharr H, Tsaftaris SA. Image analysis: the new Bottleneck in plant phenotyping. IEEE Signal Proc Mag. 2015;32:126–131. doi: 10.1109/MSP.2015.2405111. [DOI] [Google Scholar]
- 2.Qiangqiang Z, Zhicheng W, Weidong Z, Yufei C. Contour-based plant leaf image segmentation using visual saliency. In: Zhang Y-J, editor. Image and graphics. Cham: Springer; 2015. pp. 48–59. [Google Scholar]
- 3.Cao Q, Xu L. Unsupervised greenhouse tomato plant segmentation based on self-adaptive iterative latent dirichlet allocation from surveillance camera. Agronomy. 2019;9:91. doi: 10.3390/agronomy9020091. [DOI] [Google Scholar]
- 4.Ispiryan R, Grigoriev I, zu Castell W, Schäffner A. A segmentation procedure using colour features applied to images of Arabidopsis thaliana. Funct Plant Biol. 2013;40:1065–1075. doi: 10.1071/FP12323. [DOI] [PubMed] [Google Scholar]
- 5.Klukas C, Chen D, Pape J-M. Integrated analysis platform: an open-source information system for high-throughput plant phenotyping. Plant Physiol. 2014;165(2):506–518. doi: 10.1104/pp.113.233932. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tsaftaris S, Minervini M, Scharr H. Machine learning for plant phenotyping needs image processing. Trends Plant Sci. 2016;21:989–991. doi: 10.1016/j.tplants.2016.10.002. [DOI] [PubMed] [Google Scholar]
- 7.Singh A, Ganapathysubramanian B, Sarkar S, Singh A. Deep learning for plant stress phenotyping: trends and future perspectives. Trends Plant Sci. 2018;23:883–898. doi: 10.1016/j.tplants.2018.07.004. [DOI] [PubMed] [Google Scholar]
- 8.Wang X, Yang W, Wheaton A, Cooley N, Moran B. Efficient registration of optical and IR images for automatic plant water stress assessment. Comput Electr Agric. 2010;74:230–237. doi: 10.1016/j.compag.2010.08.004. [DOI] [Google Scholar]
- 9.Henke M, Junker A, Neumann K, Altmann T, Gladilin E. Comparison and extension of three methods for automated registration of multimodal plant images. Plant Methods. 2019;15:44. doi: 10.1186/s13007-019-0426-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kamilaris A, Prenafeta-Boldú F. Deep learning in agriculture: a survey. Comput Electr Agric. 2018;147:70–90. doi: 10.1016/j.compag.2018.02.016. [DOI] [Google Scholar]
- 11.Giuffrida M, Chen F, Scharr H, Tsaftaris S. Citizen crowds and experts: observer variability in image-based plant phenotyping. Plant Methods. 2018;14:12. doi: 10.1186/s13007-018-0278-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Wang Y, Xu L. Unsupervised segmentation of greenhouse plant images based on modified Latent Dirichlet Allocation. Peer J. 2018;6:5036. doi: 10.7717/peerj.5036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhang P, Xu L. Unsupervised segmentation of greenhouse plant images based on statistical method. Sci Rep. 2019;8:4465. doi: 10.1038/s41598-018-22568-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Junker A, Muraya MM, Weigelt-Fischer K, Arana-Ceballos F, Klukas C, Melchinger AE, Meyer RC, Riewe D, Altmann T. Optimizing experimental procedures for quantitative evaluation of crop plant performance in high throughput phenotyping systems. Front Plant Sci. 2015;5:770. doi: 10.3389/fpls.2014.00770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Henke M, Junker A, Neumann K, Altmann T, Gladilin E. Semi-automated annotation of plant images using k-means clustering of Eigen-colors (kmSeg) (2020). https://ag-ba.ipk-gatersleben.de/kmseg.html
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Examples of data used and analyzed in this study are provided in supplementary materials. Further datasets are available from the corresponding author on request.