Geographic-Scale Coffee Cherry Counting with Smartphones and Deep Learning

Juan Camilo Rivera Palacio; Christian Bunn; Eric Rahn; Daisy Little-Savage; Paul Günter Schmidt; Masahiro Ryo

doi:10.34133/plantphenomics.0165

. 2024 Apr 3;6:0165. doi: 10.34133/plantphenomics.0165

Geographic-Scale Coffee Cherry Counting with Smartphones and Deep Learning

Juan Camilo Rivera Palacio ^1,^2,³, Christian Bunn ², Eric Rahn ², Daisy Little-Savage ⁴, Paul Günter Schmidt ², Masahiro Ryo ^1,^3,^*

PMCID: PMC10988386 PMID: 38572469

Abstract

Deep learning and computer vision, using remote sensing and drones, are 2 promising nondestructive methods for plant monitoring and phenotyping. However, their applications are infeasible for many crop systems under tree canopies, such as coffee crops, making it challenging to perform plant monitoring and phenotyping at a large spatial scale at a low cost. This study aims to develop a geographic-scale monitoring method for coffee cherry counting, supported by an artificial intelligence (AI)-powered citizen science approach. The approach uses basic smartphones to take a few pictures of coffee trees; 2,968 trees were investigated with 8,904 pictures in Junín and Piura (Peru), Cauca, and Quindío (Colombia) in 2022, with the help of nearly 1,000 smallholder coffee farmers. Then, we trained and validated YOLO (You Only Look Once) v8 for detecting cherries in the dataset in Peru. An average number of cherries per picture was multiplied by the number of branches to estimate the total number of cherries per tree. The model's performance in Peru showed an R² of 0.59. When the model was tested in Colombia, where different varieties are grown in different biogeoclimatic conditions, the model showed an R² of 0.71. The overall performance in both countries reached an R² of 0.72. The results suggest that the method can be applied to much broader scales and is transferable to other varieties, countries, and regions. To our knowledge, this is the first AI-powered method for counting coffee cherries and has the potential for a geographic-scale, multiyear, photo-based phenotypic monitoring for coffee crops in low-income countries worldwide.

Introduction

Coffee is one of the most widely consumed beverages, produced across more than 70 tropical countries, involving 12 million smallholder farmers globally [1]. Coffee production faces the risk of declining yields and quality due to drastic changes in temperature and precipitation [2–4], frequent pest and disease outbreaks [5], unstable selling prices, and high input costs, such as fertilizers and herbicides. Furthermore, it is projected that up to 50% of the land suitable for coffee cultivation could be lost globally by 2050 due to climate change [6]. Therefore, it is crucial to monitor coffee crops at large spatial scales to cope with climate change with phenotyping.

Recent advances in agricultural digitalization, including remote sensing, mechanistic simulation, and artificial intelligence (AI), have the potential to enable large-scale, accurate phenotypic monitoring for various crop types while addressing the high cost of measuring equipment, the cost of labor, the high complexity of the models or methods, the unavailability of Earth observations, and the lack of historical plant information including weather, soil, and management practices. For instance, yield prediction can utilize cameras equipped with RGB, red-edge, and multispectral bands, as well as unmanned aerial vehicles (UAVs). However, such devices are often too expensive for smallholder farmers, particularly in low-income countries. Recent studies have explored mapping in coffee using remotely sensed imagery. However, it remains challenging to apply remote sensing techniques because coffee trees are difficult to identify visually from space due to intercropping, frequent dense cloud coverage, and tree canopy cover [7].

To date, statistical modeling approaches have been the most popular method for estimating coffee crop productivity [8]. For example, [9] developed a method to count the number of cherries per tree by dividing the plant into 4 quadrants and then counting the nodes and cherries per node for each quadrant. Despite its high accuracy, it is time-consuming to design the division of the plant into quadrants for each individual plant. Moreover, it requires the condition of nonsteep and nonmuddy slopes in the field, which is impractical in many locations [8]. Other studies [10–12] propose productivity prediction using a sample of the productive lateral branches and an estimate of the quantity per lateral. Furthermore, [13] introduced a method based on genomic information for coffee production and [14] with agrometeorological information. Despite the high accuracy of these models, limitations persist concerning data acquisition and the breadth of available data.

In addition to statistical modeling, there is a growing number of studies employing machine learning methods. For instance, [15] developed a model based on an extreme learning machine and random forest algorithms to link soil fertility properties with Robusta coffee production in Vietnam, achieving a coefficient of determination (R²) of 0.60. In another study, [16] utilized UAV-taken aerial images of 144 trees in Mina Gerais. They employed 5 algorithms: linear support machines, gradient boosting regression, random forests, partial least square regression, and neuroevolution of augmenting topologies, the latter proving to be the most effective, exhibiting a mean absolute percentage error of 32%. Similarly, [17] proposed segmentation and a convolutional neural network (CNN) for the Castillo coffee variety using mobile images, achieving an R² of 0.59. While these studies demonstrate the promising potential of AI applications for coffee production, they typically focus on one variety in a single country and require well-trained personnel for image acquisition. Collectively, none of the existing methods are suitable for estimating coffee crop yields on hundreds or thousands of smallholder farms at large spatial scales (i.e. scalability). A citizen science approach to data collection could address this scalability issue [18].

We aim to develop a geographic-scale, low-cost method to monitor the number of coffee cherries with the help of local farmers by using pictures captured on their mobile phones. To achieve this, we developed a relatively simple field sampling protocol that local farmers can follow individually without formal training. We collected images and applied the You Only Look Once (YOLO) v8 object detection method. Our approach garnered support from thousands of local coffee farmers, facilitating data collection across multiple regions in Peru and Colombia. Importantly, we evaluated the model's generalizability and transferability across countries and diverse coffee varieties, a novel effort not previously undertaken.

Materials and Methods

Study site

The target regions of this study (Fig. 1) are the northern and central parts of the coffee districts in Peru (Chinchaque, Chirinos, Cañariz, Lalaquiz, Pongoa, and San José Lourdes; latitude–longitude of 5°–6°S, 6°–7°W) and the south-western and western parts of the coffee municipalities in Colombia (Génova, Cajibío, El tambo, Morales, Piendamó, and Popayán; 2°–4°N, 75°–77°W).

The target districts in Peru cover elevational gradients and produce highly heterogeneous environments with variations in soil types. The maximum monthly temperature ranges between 20 °C (July) and 34 °C (March), while the minimum monthly temperature varies between 14 °C (July to August) and 23 °C (February) [19] (region A in Fig. 1). In contrast, the target region in Colombia is humid and semihumid. Rainfall exhibits a bimodal pattern, peaking in April (120 mm) and October (110 mm). The average temperature hovers around 18.1 °C (October to November) and 21 °C (June to August) [20] (region B in Fig. 1). These 2 target regions exhibit distinct weather patterns; Colombia tends to be cooler and rainier than Peru.

Sampling campaign with local farmers

Data collection involved a survey of 977 farmers in Colombia (300 farmers) and Peru (677 farmers). The majority of farmers surveyed (69% in Peru and 77% in Colombia) had a coffee area of less than 2 hectares (ha). A smaller percentage, 19% in Colombia and 28% in Peru had a coffee area between 2 and 5 ha, and only a fraction (3% in Colombia and 2% in Peru) had a coffee area greater than 5 ha. In addition, the majority of surveyed locations in Colombia (88%) were located at elevations between 1,500 and 2,000 m above mean sea level (A.M.S..L), whereas in Peru, most (61%) were located at elevations between 1,000 and 1,500 m A.M.S.L.

This study was focused on Arabica coffee (Coffea arabica L.). In Peru, the most common variety was Catimor cogollo verde 70%, followed by Catimor cogollo morado (8%), Caturra (1%), and Tipica (4%). Other varieties accounted for the remaining 17%. In Colombia, the majority of surveyed farmers used Castillo (90%), with small percentages of Supremo (4%), Variedad Colombia (1%), and other varieties (5%). In terms of tree age, 56% of the trees in Peru were between 3 and 7 years old, 39% were between 7 and 14 years old, and 5% were between 14 and 21 years old. In Colombia, 88% of the trees were between 3 and 7 years old, 12% were between 7 and 14 years old, and only 0.03% were between 14 and 21 years old.

Sampling protocols with a pictured-based model and manual measurement

We received help from 53 survey personnel (hereafter, enumerators), who visited the local farmlands and took pictures of coffee trees through local partners during the cherry growing season, from March to August 2022 in Peru and from March to November 2022 in Colombia. The mobile pictures were collected using the enumerators' mobile devices. In total, 2,968 trees were investigated, yielding a collection of 8,904 pictures. Specifically, 2,450 trees were in Colombia and 518 in Peru, contributing to 7,350 and 1,554 pictures, respectively. The sampling protocol was designed as follows.

Nine trees were selected using a random sampling method per each coffee crop field. For each tree, 1 branch was randomly selected from each of the upper, middle, and lower positions (i.e., 3 branches per tree), and 1 photo was taken per branch. When taking a photo, the enumerators were instructed to adhere to the following guidelines: (a) capture photos during daylight hours, specifically between 6 AM and 6 PM; (b) focus on as many cherries as possible; (c) avoid moving the camera post-capture to prevent picture distortion; and (d) choose an angle that prevents direct sunlight from shining on the camera lens.

We attempted to capture the most cherries on each branch using a mobile phone image because our model uses this information to estimate the total number of cherries. Earlier studies have explored cherry detection using mobile images, employing computer vision techniques as segmentation [17], which is effective primarily for red cherries against green leaves, and incorporating machine learning and deep learning algorithms [21]. In this study, we proposed a protocol for photographing only 3 branches, rather than the entire tree, in order to capture as many cherries as possible.

The protocol was used to evaluate the reliability of using mobile images to replace the manual counting of cherries on branches. To assess the accuracy of our approach, we manually counted the cherries on the respective branches. The total number of cherries per tree was measured as follows: For each selected branch, the enumerator manually counted all cherries. An average number of cherries was calculated from the manually counted cherries of the upper, middle, and lower branches. This average was multiplied by the total number of productive branches on the tree, also counted manually. The product of this calculation was the total number of cherries per tree.

The images were taken using a variety of commercially available mobile phones owned by the enumerators. The most popular brands were Xiaomi, Samsung, and Motorola, with the Xiaomi M2006C3LG model being the most popular mobile. These mobiles capture images in RGB color format as JPEG format and are equipped with internal Global Positioning System capabilities. The majority of the 768 × 768 and 1,024 × 1,024 images were taken without flash at resolutions between 1 and 12 megapixels.

Images annotation

The mobile phone images were annotated using the PASCAL VOC format and the labeling graphical image annotation tool [22]. The length and width of the training images were rescaled to 640 pixels. The process of manual annotation was initiated after the acquisition of the images were captured by the enumerators in the coffee fields. We defined 3 categories for annotation: black cherries, red cherries, and green cherries. This categorization enables the model to recognize the primary colors of cherries throughout their growth cycle and detect them at any stage prior to harvest. A minimum bounding box is drawn around the cherry for each of these categories. In the case of occlusion, we followed the methodology proposed by [23], which recommends not labeling cherries if the occlusion is greater than 85% and the visible target area is less than 15%. When a cherry is obscured by others, the bounding rectangle should encompass the entire cherry, including any parts that are behind the others. The final annotated dataset consisted of 436 images with a total of 35,694 labeled cherries: 35,247 (98.7%) were labeled as green cherries, 342 (0.9%) as red cherries, and 105 (0.2%) as black cherries. As the cherries were mainly counted during the growth stage when the cherries are green, the occurrences of the other colors are uncommon in our data set.

YOLO v8 model for cherry detection

CNNs were used for cherry detection. We used the state-of-the-art object detection network, YOLO v8 network [24]. This network is an advancement of the original YOLO framework [25] and previous versions [24,26–30]. YOLO is a real-time object detection framework that is designed for fast object detection and classification. It simplifies the detection task into a regression problem, converting image pixels directly into bounding box coordinates and class probabilities [25]. The framework uniquely performs object detection through a single CNN and uses the entire image to predict each bounding box.

During the training phase, YOLO divides an image into a 7 × 7 grid. If the center of an object falls within the boundaries of a grid cell, the cell is responsible for detecting the object. Each cell is responsible for predicting bounding boxes, confidence scores, and class probabilities. The confidence score is defined as follows (Eq. 1):

Confidence score = \Pr (Object) \times {IoU}_{true}^{pred}

(1)

The confidence score represents the model's certainty that the bounding box contains an object and the accuracy of its prediction of the size and location of the box. If there is no object, then the confidence score is 0. Otherwise, the confidence score is equal to the intersection of the prediction and the ground truth. YOLO generates multiple bounding box predictions per cell and employs Non-Maximum Suppression to identify the most accurate bounding box. The Intersection over Union (IoU) metric evaluates the precision of object detection by calculating the overlap ratio between the predicted bounding box and the true bounding box. The IoU is defined as follows (Eq. 2):

IoU = \frac{S_{overlap}}{S_{union}}

(2)

where S_overlap is the area of intersection between the predicted bounding box and the true bounding box, and S_union is the total area covered by both bounding boxes.

Despite their high accuracy, earlier versions of YOLO (v1 to v4) encountered difficulties with images acquired through remote sensing technologies such as drones. These challenges arise from variations in size, large coverage areas, and high object densities [24]. The high-density challenge particularly affects cherry detection in images acquired by mobile devices, where surrounding elements such as leaves and branches can interfere with accurate detection (Fig. 2).

Fig. 2. — The interference of external objects in mobile images. (A) Large leaves blocking the view of the cherries on the branch. (B) Various external objects, including cherries from other branches, fallen leaves, and extraneous branches.

In previous YOLO versions, YOLO v5 incorporates the Transformer Prediction Head [31], enhancing object localization accuracy in densely populated scenes. Additionally, it integrates the Convolutional Block Attention Module [32], improving the precision of identifying regions of interest across extensive areas [24]. This approach employs a self-training classifier to improve classification performance. YOLO v6 [27] features an extended backbone and neck design, whereas YOLO v7 [30] adopts a transformer architecture. Both versions aim to increase the accuracy and velocity compared to their predecessors.

YOLO v8 [33] utilizes an architecture similar to YOLO v5, with key modifications in the backbone, particularly in the module responsible for feature extraction. By incorporating an anchor-free model and a semantic segmentation model, YOLO v8 demonstrates versatility, effectively both object detection and semantic segmentation tasks [33].

In this study, the YOLO v8 detection model was adapted using the Ultralytics framework [29], which is built on top of PyTorch 1.7 [34]. The models were developed on a Windows 10 platform using Python 3.7.0 subroutines running on an Nvidia GeForce RTX 3080 GPU 1440 MHz.

In Peru, we categorized the collected data into 3 sets: training (80%, n = 346), validation (10%, n = 43), and testing (10%, n = 43). Each set served a specific purpose: model development, evaluation/selection, and performance testing, respectively. Additionally, Table 1 provides the network parameters.

Table 1.

Initialization parameters of YOLO v8 dense network

Size of input images(pixels)	Batch	Momentum	Initial learning rate	Decay	Epochs
640 × 640	16	0.937	0.1	0.0005	100

Open in a new tab

In order to improve the detection accuracy of the model, the initial values of parameters were of a pretrained model developed by Ultralytics, which was trained on the COCO (Common Objects in Context) dataset [35]. The COCO dataset contains more than 330.000 images, featuring 2.5 million objects instances labeled across 80 different categories. To address the server's memory constraints, we adjusted input images to 640 × 640 and set the batch size to 16. We used 100 epochs to better analyze the training process. The momentum, initial learning rate, weight decay regularization, and the other parameters were set as default settings in YOLO v8. The model was trained after defining the training process.

Data augmentation artificially increases the size and diversity of the training set, allowing the model to reduce overfitting on image data [36]. The acquired images were preprocessed using data augmentation techniques, including brightness adjustment and geometric augmentation, specifically scaling, shearing, left-right flipping, and mosaic techniques. These techniques were applied to the cherry mobile images because they reflect real scenarios in coffee plantations. Brightness adjustment can eliminate the noise from ambient light or the low-resolution camera, and geometric augmentation allows for a more accurate representation of the shape and size of cherries, which can vary depending on growth stage, variety, climate, and agricultural practices [37]. Data augmentation was used in the training phase.

Estimating the total number of cherries at the tree level from information about the number of cherries per branch

Our YOLO model is able to count the number of cherries on the selected branches. To estimate the number of cherries per tree, we also used data on the number of branches per tree, collected during the field campaign (see Sampling protocols with a pictured-based model and manual measurement). We, therefore, estimate the total cherries load per tree (T) in a coffee plant at any given moment as the product of the total number of productive branches (P) and the average of the number of cherries per branch (C), where i is the ith branch (i = 1 upper, i = 2 middle, and i = 3 lower positions) (Eq. 3).

T = P^{*} 1 / 3^{*} \sum_{i = 1}^{3} C_{i}

(3)

The following evaluation metrics were used for tree-level number of cherries estimation: the root mean square error, mean absolute error, mean absolute percentage error (MAPE), and R². To confirm associations between the number of cherries in each branch, the total number of productive branches, and the total of cherries per tree during the validation phase, we conducted a correlation analysis.

Additionally, we utilized the bounding box predictions (box_loss) which measures the discrepancy between the prediction box and the ground truth bounding box [25]; the classification loss (cls_loss), representing the difference between the predicted and actual labels [38]; and the dynamic feature learning loss (dfl_loss), a loss function designed for the extraction of dynamic and discriminative features from data [39]. The model's performance was further evaluated using the mean average precision (mAP), which assesses the model's ability to detect and accurately localize boxes around objects within images [40], with a particular emphasis on up to 50 object detections per image (mAP50) or between 50 and 95 objects detections per image (mAP50-95), alongside recall and precision metrics.

Results

Description of the coffee tree

Table 2 presents the summary statistics of the number of cherries per tree based on manual counting. This includes the mean, minimum, maximum, standard deviation, skewness, and coefficient of variation (CV) for each country and variety. The average cherry counts were substantially higher in Peru, ranging from 554 to 1,824, compared to Colombia, where counts ranged from 174 to 455. Across all varieties, the maximum count was often more than 10 times the minimum. The CVs exceeding 40% indicate high variation in cherry counts among all varieties and countries. In summary, the sampling campaign captured a wide range, from 174 to 1,824 coffee cherries per tree, illustrating the diversity of cherry counts observed.

Table 2.

Summary statistics of manually counted coffee cherries per tree. CV is the coefficient of variation.

Country	Coffee varieties	Mean (cherries per tree)	Min (cherries per tree)	Max (cherries per tree)	Standard deviation (cherries per tree)	Skewness	CV (%)	Sample size (total pictures)
Peru	Catimor cogollo verde	1,431.58	70.00	8,085.00	1,049.73	2.16	73.33	1,085
	Catimor cogollo morado	1,016.64	40.00	2,749.50	743.50	0.78	73.13	120
	Caturra	1,824.36	144.00	3,440.00	739.38	-0.20	40.53	28
	Otros	972.36	160.00	2,925.00	631.38	1.01	64.93	265
	Typico	553.86	153.00	1,012.50	233.68	0.09	42.19	56
Colombia	Castillo	413.18	3.33	4,982.67	435.08	2.62	105.30	6,623
	Otros	454.79	18.00	3,241.33	471.08	3.05	103.58	377
	Supremo	268.31	4.67	1,241.33	294.92	1.46	109.92	296
	Variedad colombia	174.04	5.33	912.00	271.65	1.98	156.09	54

Open in a new tab

The number of productive branches demonstrates the strongest correlation with the total number of cherries at the tree level (r = 0.67, P < 0.001) (Fig. 3). Additionally, the numbers of cherries across different branch positions (top, middle, and bottom positions) were found to be correlated, though weakly (r = 0.11 for upper and middle, r = −0.025 for lower and upper, and r = 0.49 for middle and lower; P < 0.001 for all) (Fig. 3). This suggests that sampling from various heights is crucial for a more accurate estimation of the total number of cherries at tree level.

YOLO v8’s performance for cherry detection

YOLO v8 was trained with the images of coffee cherries (green, red, and black). The losses associated with the bounding box, classification, dynamic feature learning, precision, and recall metrics functions are shown in (Fig. 4).

Fig. 4. — Loss and metric functions during training and validation phases with YOLO v8. It includes box_loss, cls_loss, dfl_loss, metric precision (B), and metric recall (B), applying IoU threshold B = 0.5.

The performance of object detection was characterized by a high precision of 0.85 and a low recall of 0.13 with threshold B set at an IoU of 0.5 (Table 3).

Table 3.

Summary of metrics: Precision, recall, mAP50, and mAP50-95 for YOLO v8 at Epoch 100 with threshold B set at an IoU of 0.5

Epochs	Precision (B)	Recall (B)	mAP50 (B)	mAP50-95 (B)
100	0.85	0.13	0.18	0

Open in a new tab

Model performance at branch level

Figure 5 shows the comparison between manual counting and picture-based estimations for each branch within the dataset collected from Peru and Colombia. The highest correlation in measurement was observed in the lower position (branch 3) with an (R² = 0.68), whereas the lowest correlation was noted in the upper position (branch 1) with an (R² = 0.59). Figure 6 provides an example showcasing the capability of YOLO v8 in detecting cherries on branches, particularly emphasizing its high precision in detecting green cherries.

Fig. 5. — Comparison of manual counts and picture-based estimations. This figure displays the comparison between manually counted cherries and picture-based estimations for (1) upper branch, (2) middle branch, and (3) lower branch (n = 8,904 each). The axes are log10(X+1)-transformed.

Fig. 6. — Green cherry detection using YOLO v8. This figure presents an example of detecting green cherries on a branch with YOLO v8. The detections are highlighted by red boxes, illustrating the precise identification of coffee cherries.

Model performance at tree level

We explore the effectiveness of our model across different coffee varieties and countries at the individual tree level. The model was initially trained using data from tree varieties such as Catimor Cogollo Morado, and Catimogor Cogollo Verde in Peru. Subsequently, we conducted tests in Colombia, evaluating its performance on varieties including Castillo, Variedad Colombia, Supremo, and others (Fig. 7).

Fig. 7. — Predicted vs. measured number of coffee cherries per tree. This figure is a scatterplot comparing the predicted and actual number of coffee cherries per tree, categorized by country and variety. The axes are log10(X+1)-transformed.

The R² model performance in Colombia was 0.71, demonstrating variability across different coffee varieties, with scores ranging from 0.82 to 0.99. The Variedad Colombia variety exhibited the highest performance. Notably, this surpasses the performance in Peru, where the model achieved an R² of 0.59. In Peru, model performance varied among varieties, from 0.32 to 0.77, with the Caturra variety achieving the highest performance (Fig. 7). When we combined all test data from Colombia and Peru, the overall model's performance was as follows: the R² of 0.72, the mean absolute error of 252.63 cherries per tree, the MAPE of 41.74 %, and the root mean square error of 621.46 cherries per tree.

Discussion

We demonstrated the high performance (R² = 0.72) of the first geographic-scale, low-cost method for monitoring coffee cherries using pictures taken with smartphones with an object detection model. The results showed that the vast majority of cherries were detected, similar to the findings in [23] which used YOLO v3 for apple detection, and [41] on bananas using the DeepLab with CNN. Our model´s performance was robust (R² = 0.72, MAPE = 41.74%) and comparable to the methods proposed by [17] (R² = 0.66), the branch-level machine vision system [42] (R² = 0.93), deep learning combined with UAV imagery [16] (MAPE = 31.75%), and manual counting through quadrant-based sampling [9] (R²= 0.92).

The key advancement we achieved was the geographic extension of coffee cherry counting and monitoring, which has mostly been investigated at the field scale, without sacrificing predictive power. This study provides a marked contribution to a geographic-scale in situ phenotypic monitoring for coffee crops in low-income countries, which is scalable in space and time and thus helpful for monitoring the large-scale impacts of climate change on coffee crops. It can provide coffee cherry count data in regions lacking such information [42]. This information is valuable at a research level, as it can serve as input for other endeavors, such as creating strategies for climate change adaptation.

Our approach is notably promising because it diverges from prior research [16,17,42] by testing the model's efficacy in Colombia, distinct from its training environment in Peru. Notably, the model's performance improved, escalating from 0.59 in Peru to 0.71 in Colombia. The dissimilarities in coffee tree characteristics, geographical elevation, and climate conditions between the 2 countries were substantial, encompassing different species and varieties along with varying average cherry counts. Furthermore, in contrast to earlier studies [7,16,17], our data collection process utilized the efforts of local farmers, avoiding specialized equipment like multispectral cameras and drones. The photographs were taken by actual users with their conventional mobile phones. The object detection model, initially trained on Catimor cogollo verde and Catimor cogollo morado varieties in Peru, was subsequently applied to count cherries in the Tipica, Castillo, Supremo, and the variedad Colombia varieties in Colombia. This approach highlights not only the model's adaptability but also its ability to generalize across different tree types in diverse agricultural settings. To further enhance model performance in a new environment, fine-tuning the developed deep learning model with images collected locally could be considered.

The model exhibits both overestimation and underestimation effects. The detection tends to overestimate the number of cherries in branches with few cherries, possibly due to the larger space captured in the picture, which allows for the inclusion of cherries from adjacent coffee plants. The coffee plant's challenging isolation, attributed to dimorphic branching [43] and the close proximity between plants, contributes to this effect. Conversely, the model demonstrates a tendency to underestimate, possibly influenced by the constrained space within mobile images or situations where only a portion of the branch is visible in the photo. This underestimation becomes noticeable when the count exceeds one thousand cherries, suggesting that the maximum number of cherries that can fit within a single mobile picture is around 1000.

The number of tree branches alone may not be sufficient for an accurate estimation of the total real number of cherries. Eight or 9 productive branches on Arabica plants could result in a real yield estimate of up to R² = 0.92 [10]. However, at the plot level, this would require a large number of branches, making it impractical due to the unavailability of a large number of pictures and the time-consuming nature of the process. Moreover, there may be constraints related to connectivity and infrastructure that prevent the implementation of this approach on certain coffee farms.

The productive branches play a crucial role in our model. This result is consistent with findings of nondestructive methods [8,37]. Counting the number of productive branches is conducted manually, as the object detection model, YOLO v8, is not suitable due to the concealment of productive branches beneath large leaves. Adult coffee trees exhibit a leaf area ranging 22 and 45 m² [37]. Although manual counting of branches may take much less time than counting cherries, we consider this task needs to be also supplemented with an alternative approach in the subsequent steps.

After evaluating the models, we realized that the design of a good photo-sampling protocol was crucial for accurately predicting the total number of cherries and thus advancing our monitoring approach. In Peru, many images were captured of whole trees rather than individual branches. These findings underscore the importance of following established protocols for image acquisition. Our protocol involves taking 3 pictures per tree, capturing the upper, middle, and lower branches, to capture the potential total number of cherries. Ensuring that these pictures accurately reflect the cherry quantity on the tree is paramount. Although our sampling protocol showed some promising results, there is still a large room for improvement in data acquisition. For instance, determining the optimal angles to capture most of the cherries will be crucial. The lower and middle branches may benefit from an overhead perspective, while the upper branch might require an underneath angle. Factors such as the direction of the sunlight, image blur, distortion (resulting from swift camera movement), and distance from the branches can also affect accuracy.

Identifying the main sources of error and refining the protocol can increase precision, but it should be noted that adding more photos or complexity may make local farmers more reluctant to use the method for their real-world application. The best approach between simplicity and accuracy needs to be identified with local farmers. In addition, future studies could focus on how to reliably scale the number of total cherries estimates from individual tree level to plot level yields, including the exploration of new species such as Robusta (C. canephora).

Acknowledgments

We are grateful to anonymous reviewers for constructive comments. We appreciate S. Webb, C. Feil, A. Eltzinger, R. Gautron, F. Viviana Narvaez, and J. Hoyos for their contributions during the development of this research, and to Professor A. Alvarez Bustos, A. Rusca, C. Valor, and K. Idarraga for labeling the images. We would also like to thank K. Suzuki for his technical support.

Funding: This work was supported by the Brandenburg University of Technology Cottbus-Senftenberg (BTU), Graduate Research School (GRS) cluster project “Integrated analysis of Multifunctional Fruit production landscape to promote ecosystem services and sustainable land-use under climate change” (grant number BTUGRS2018_19) and the Croppie funded by Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ) (grant number 81275837).

Author contributions: J.C.R.P. conducted data analysis and wrote the manuscript. C.B., E.R., and M.R. designed the study. D.L-S. and P.G.S. supported data collection. All authors contributed to the final version of the manuscript draft.

Competing interests: The authors declare that there is no conflict of interest regarding the publication of this article.

Data Availability

Some representative pictures and the Python script used for the study are available at the GitHub repository: https://github.com/j-river1/Croppie.

References

1.Food and Agriculture Organization of the United Nations (FAO). FAOSTAT. [accessed 1 May 2023] https://www.fao.org/faostat/en/#home
2.Gay C, Estrada F, Conde C, Eakin H, Villers L. Potential impacts of climate change on agriculture: A case of study of coffee production in Veracruz, Mexico. Clim Chang. 2006;79(3):259–288. [Google Scholar]
3.Schroth G, Laderach P, Dempewolf J, Philpott S, Haggar J, Eakin H, Castillejos T, Moreno JG, Pinto LS, Hernandez R, et al. Towards a climate change adaptation strategy for coffee communities and ecosystems in the Sierra Madre de Chiapas, Mexico. Mitig Adapt Strateg Glob Chang. 2009;14(7):605–625. [Google Scholar]
4.Zullo J, Pinto HS, Assad ED, Ávila AMH. Potential for growing Arabica coffee in the extreme south of Brazil in a warmer world. Clim Chang. 2011;109(3):535–548. [Google Scholar]
5.Cerda R, Avelino J, Gary C, Tixier P, Lechevallier E, Allinne C. Primary and secondary yield losses caused by pests and diseases: Assessment and modeling in coffee. PLoS One. 2017;12(1):1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Bunn C, Läderach P, Ovalle Rivera O, Kirschke D. A bitter cup: Climate change profile of global production of Arabica and Robusta coffee. Clim Chang. 2015;129(1–2):89–101. [Google Scholar]
7.Maskell G, Chemura A, Nguyen H, Gornott C, Mondal P. Integration of sentinel optical and radar data for mapping smallholder coffee production systems in Vietnam. Remote Sens Environ. 2021;266: Article 112709. [Google Scholar]
8.Castro-Tanzi S, Dietsch T, Urena N, Vindas L, Chandler M. Analysis of management and site factors to improve the sustainability of smallholder coffee production in Tarrazú, Costa Rica. Agric Ecosyst Environ. 2012;155:172–181. [Google Scholar]
9.Upreti G, Bittenbender H, Ingamells JL. Rapid estimation of coffee yield. In: International Scientific Association of Coffee, editor. Proceedings of the 1991 ASIC Conference on Coffee Science. San Francisco (CA): ASIC; 1991. p. 585–593.
10.Castro-Tanzi S, Flores M, Wanner N, Dietsch TV, Banks J, Ureña-Retana N, Chandler M. Evaluation of a non-destructive sampling method and a statistical model for predicting fruit load on individual coffee (Coffea arabica) trees. Sci Hortic. 2014;167:117–126. [Google Scholar]
11.Peeters LYK, Soto-Pinto L, Perales H, Montoya G, Ishiki M. Coffee production, timber, and firewood in traditional and Inga-shaded plantations in southern Mexico. Agric Ecosyst Environ. 2003;95(2–3):481–493. [Google Scholar]
12.Soto-Pinto L, Perfecto I, Castillo-Hernandez J, Caballero-Nieto J. Shade effect on coffee production at the northern Tzeltal zone of the state of Chiapas, Mexico. Agric Ecosyst Environ. 2000;80(1):61–69. [Google Scholar]
13.Fanelli Carvalho H. The effect of bienniality on genomic prediction of yield in arabica coffee. Euphytica. 2020;216(6):101. [Google Scholar]
14.Oliveira Aparecido LE, Souza Rolim G, Camargo Lamparelli RA, Souza PS, Santos ER. Agrometeorological models for forecasting coffee yield. Agron J. 2017;109(1):249–258. [Google Scholar]
15.Kouadio L, Deo RC, Byrareddy V, Adamowski JF, Mushtaq S, Phuong Nguyen V. Artificial intelligence approach for the prediction of Robusta coffee yield using soil fertility properties. Comput Electron Agric. 2018;155:324–338. [Google Scholar]
16.Barbosa BDS, Ferraz GAES, Costa L, Ampatzidis Y, Vijayakumar V, Santos LM. UAV-based coffee yield prediction utilizing feature selection and deep learning. Smart Agric Technol. 2021;1: Article 100010. [Google Scholar]
17.Rodríguez JP, Corrales DC, Aubertot JN, Corrales JC. A computer vision system for automatic cherry beans detection on coffee trees. Pattern Recogn Lett. 2020;136:142–153. [Google Scholar]
18.Ryo M, Schiller J, Stiller S, Rivera Palacio JC, Mengsuwan K, Safonova A, Wei Y. Deep learning for sustainable agriculture needs ecology and human involvement. J Sustain Agric Environ. 2023;2(1):40–44. [Google Scholar]
19.Servicio Nacional de Meteorología e Hidrología del Perú (SENHAMI). Monitoreo Meteorológico Diario. SENHAMI. [accessed 1 May 2023] https://www.senamhi.gob.pe/
20.Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM). Tiempo y el Clima, IDEAM. [accessed 1 May 2023] http://www.ideam.gov.co
21.Maheswari P, Raja P, Apolo-Apolo OE, Pérez-Ruiz M. Intelligent fruit yield estimation for orchards using deep learning based semantic segmentation techniques—A review. Front Plant Sci. 2021;12:10. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Tkachenko M, Malyuk M, Holmanyuk A, Liubimov (N), Label studio: Data labeling software. Label Studio [accessed 1 May 2023] https://github.com/heartexlabs/label-studio
23.Tian Y, Yang G, Wang Z, Wang H, Li E, Liang Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput Electron Agric. 2019;157:417–426. [Google Scholar]
24.Zhu X, Lyu S, Wang X, Zhao Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. arXiv. 2021. 10.48550/arXiv.2108.11539 [DOI]
25.Redmon J, Santosh D, Ross G, Ali F. You Only Look Once: Unified, real-time object detection. 2016. 10.48550/arXiv.1506.02640 [DOI]
26.Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. arXiv. 2016. 10.48550/arXiv.1612.08242 [DOI]
27.Li C. YOLOv6 v3.0: A full-scale reloading. arXiv. 2023. 10.48550/arXiv.2301.05586 [DOI]
28.Bochkovskiy A, Wang C-Y, Liao H-YM. YOLOv4: Optimal speed and accuracy of object detection. arXiv. 2020. 10.48550/arXiv.2004.10934 [DOI]
29.Ultralytics, “YOLOv5,” Ultralytics. Access: May 01, 2023. [Online]. Available: https://docs.ultralytics.com/yolov5/
30.Wang C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022. 10.48550/arXiv.2207.02696 [DOI]
31.Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv. 2020. 10.48550/arXiv.2010.11929 [DOI]
32.Woo S, Park J, Lee J-Y, Kweon IS. CBAM: Convolutional Block Attention Module. arXiv. 2018. 10.48550/arXiv.1807.06521 [DOI]
33.Treven J, Cordova-Esparza D. A comprehensive review of yolo architectures in Computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS. arXiv. 2024. 10.3390/make5040083 [DOI]
34.Paszke A. PyTorch: An imperative style, high-performance deep learning library. arXiv. 2019. 10.48550/arXiv.1912.01703 [DOI]
35.Lin T-Y. Microsoft COCO: Common Objects in Context. arXiv. 2014. 10.48550/arXiv.1405.0312 [DOI]
36.Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ, editors. Proceedings of the 26th conference on Neural Information Processing Systems Curran Associates, Inc.; 2012. p. 1–9. [Google Scholar]
37.Wintgens JN (Ed). Coffee: Growing, processing, sustainable production: A guidebook for growers, processors, traders, and researchers Weinheim (Germany): Wiley-VCH; 2004. [Google Scholar]
38.Ying Y, Zhou D-X. Unregularized online learning algorithms with general loss functions. Appl Comput Harmon Anal. 2017;42(2):224–244. [Google Scholar]
39.Thakur A, Thapar D, Rajan P, Nigam A. Deep metric learning for bioacoustic classification: Overcoming training data scarcity using dynamic triplet loss. J Acoust Soc Am. 2019;146(1):534–547. [DOI] [PubMed] [Google Scholar]
40.Babenko A, Slesarev A, Chigorin A, Lempitsky V. Neural codes for image retrieval. arXiv. 2014. 10.48550/arXiv.1404.1777 [DOI]
41.Wu F, Yang Z, Mo X, Wu Z, Tang W, Duan J, Zou X. Detection and counting of banana bunches by integrating deep learning and classic image-processing algorithms. Comput Electron Agric. 2023;209:107827. [Google Scholar]
42.Ramos PJ, Prieto FA, Montoya EC, Oliveros CE. Automatic fruit count on coffee branches using computer vision. Comput Electron Agric. 2017;137:9–22. [Google Scholar]
43.Farah A (Ed). Introduction to coffee plant and genetics London (UK): The Royal Society of Chemistry; 2019. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Some representative pictures and the Python script used for the study are available at the GitHub repository: https://github.com/j-river1/Croppie.

[B1] 1.Food and Agriculture Organization of the United Nations (FAO). FAOSTAT. [accessed 1 May 2023] https://www.fao.org/faostat/en/#home

[B2] 2.Gay C, Estrada F, Conde C, Eakin H, Villers L. Potential impacts of climate change on agriculture: A case of study of coffee production in Veracruz, Mexico. Clim Chang. 2006;79(3):259–288. [Google Scholar]

[B3] 3.Schroth G, Laderach P, Dempewolf J, Philpott S, Haggar J, Eakin H, Castillejos T, Moreno JG, Pinto LS, Hernandez R, et al. Towards a climate change adaptation strategy for coffee communities and ecosystems in the Sierra Madre de Chiapas, Mexico. Mitig Adapt Strateg Glob Chang. 2009;14(7):605–625. [Google Scholar]

[B4] 4.Zullo J, Pinto HS, Assad ED, Ávila AMH. Potential for growing Arabica coffee in the extreme south of Brazil in a warmer world. Clim Chang. 2011;109(3):535–548. [Google Scholar]

[B5] 5.Cerda R, Avelino J, Gary C, Tixier P, Lechevallier E, Allinne C. Primary and secondary yield losses caused by pests and diseases: Assessment and modeling in coffee. PLoS One. 2017;12(1):1–17. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Bunn C, Läderach P, Ovalle Rivera O, Kirschke D. A bitter cup: Climate change profile of global production of Arabica and Robusta coffee. Clim Chang. 2015;129(1–2):89–101. [Google Scholar]

[B7] 7.Maskell G, Chemura A, Nguyen H, Gornott C, Mondal P. Integration of sentinel optical and radar data for mapping smallholder coffee production systems in Vietnam. Remote Sens Environ. 2021;266: Article 112709. [Google Scholar]

[B8] 8.Castro-Tanzi S, Dietsch T, Urena N, Vindas L, Chandler M. Analysis of management and site factors to improve the sustainability of smallholder coffee production in Tarrazú, Costa Rica. Agric Ecosyst Environ. 2012;155:172–181. [Google Scholar]

[B9] 9.Upreti G, Bittenbender H, Ingamells JL. Rapid estimation of coffee yield. In: International Scientific Association of Coffee, editor. Proceedings of the 1991 ASIC Conference on Coffee Science. San Francisco (CA): ASIC; 1991. p. 585–593.

[B10] 10.Castro-Tanzi S, Flores M, Wanner N, Dietsch TV, Banks J, Ureña-Retana N, Chandler M. Evaluation of a non-destructive sampling method and a statistical model for predicting fruit load on individual coffee (Coffea arabica) trees. Sci Hortic. 2014;167:117–126. [Google Scholar]

[B11] 11.Peeters LYK, Soto-Pinto L, Perales H, Montoya G, Ishiki M. Coffee production, timber, and firewood in traditional and Inga-shaded plantations in southern Mexico. Agric Ecosyst Environ. 2003;95(2–3):481–493. [Google Scholar]

[B12] 12.Soto-Pinto L, Perfecto I, Castillo-Hernandez J, Caballero-Nieto J. Shade effect on coffee production at the northern Tzeltal zone of the state of Chiapas, Mexico. Agric Ecosyst Environ. 2000;80(1):61–69. [Google Scholar]

[B13] 13.Fanelli Carvalho H. The effect of bienniality on genomic prediction of yield in arabica coffee. Euphytica. 2020;216(6):101. [Google Scholar]

[B14] 14.Oliveira Aparecido LE, Souza Rolim G, Camargo Lamparelli RA, Souza PS, Santos ER. Agrometeorological models for forecasting coffee yield. Agron J. 2017;109(1):249–258. [Google Scholar]

[B15] 15.Kouadio L, Deo RC, Byrareddy V, Adamowski JF, Mushtaq S, Phuong Nguyen V. Artificial intelligence approach for the prediction of Robusta coffee yield using soil fertility properties. Comput Electron Agric. 2018;155:324–338. [Google Scholar]

[B16] 16.Barbosa BDS, Ferraz GAES, Costa L, Ampatzidis Y, Vijayakumar V, Santos LM. UAV-based coffee yield prediction utilizing feature selection and deep learning. Smart Agric Technol. 2021;1: Article 100010. [Google Scholar]

[B17] 17.Rodríguez JP, Corrales DC, Aubertot JN, Corrales JC. A computer vision system for automatic cherry beans detection on coffee trees. Pattern Recogn Lett. 2020;136:142–153. [Google Scholar]

[B18] 18.Ryo M, Schiller J, Stiller S, Rivera Palacio JC, Mengsuwan K, Safonova A, Wei Y. Deep learning for sustainable agriculture needs ecology and human involvement. J Sustain Agric Environ. 2023;2(1):40–44. [Google Scholar]

[B19] 19.Servicio Nacional de Meteorología e Hidrología del Perú (SENHAMI). Monitoreo Meteorológico Diario. SENHAMI. [accessed 1 May 2023] https://www.senamhi.gob.pe/

[B20] 20.Instituto de Hidrología, Meteorología y Estudios Ambientales (IDEAM). Tiempo y el Clima, IDEAM. [accessed 1 May 2023] http://www.ideam.gov.co

[B21] 21.Maheswari P, Raja P, Apolo-Apolo OE, Pérez-Ruiz M. Intelligent fruit yield estimation for orchards using deep learning based semantic segmentation techniques—A review. Front Plant Sci. 2021;12:10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Tkachenko M, Malyuk M, Holmanyuk A, Liubimov (N), Label studio: Data labeling software. Label Studio [accessed 1 May 2023] https://github.com/heartexlabs/label-studio

[B23] 23.Tian Y, Yang G, Wang Z, Wang H, Li E, Liang Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput Electron Agric. 2019;157:417–426. [Google Scholar]

[B24] 24.Zhu X, Lyu S, Wang X, Zhao Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. arXiv. 2021. 10.48550/arXiv.2108.11539 [DOI]

[B25] 25.Redmon J, Santosh D, Ross G, Ali F. You Only Look Once: Unified, real-time object detection. 2016. 10.48550/arXiv.1506.02640 [DOI]

[B26] 26.Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. arXiv. 2016. 10.48550/arXiv.1612.08242 [DOI]

[B27] 27.Li C. YOLOv6 v3.0: A full-scale reloading. arXiv. 2023. 10.48550/arXiv.2301.05586 [DOI]

[B28] 28.Bochkovskiy A, Wang C-Y, Liao H-YM. YOLOv4: Optimal speed and accuracy of object detection. arXiv. 2020. 10.48550/arXiv.2004.10934 [DOI]

[B29] 29.Ultralytics, “YOLOv5,” Ultralytics. Access: May 01, 2023. [Online]. Available: https://docs.ultralytics.com/yolov5/

[B30] 30.Wang C-Y, Bochkovskiy A, Liao H-YM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022. 10.48550/arXiv.2207.02696 [DOI]

[B31] 31.Dosovitskiy A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv. 2020. 10.48550/arXiv.2010.11929 [DOI]

[B32] 32.Woo S, Park J, Lee J-Y, Kweon IS. CBAM: Convolutional Block Attention Module. arXiv. 2018. 10.48550/arXiv.1807.06521 [DOI]

[B33] 33.Treven J, Cordova-Esparza D. A comprehensive review of yolo architectures in Computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS. arXiv. 2024. 10.3390/make5040083 [DOI]

[B34] 34.Paszke A. PyTorch: An imperative style, high-performance deep learning library. arXiv. 2019. 10.48550/arXiv.1912.01703 [DOI]

[B35] 35.Lin T-Y. Microsoft COCO: Common Objects in Context. arXiv. 2014. 10.48550/arXiv.1405.0312 [DOI]

[B36] 36.Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJ, Bottou L, Weinberger KQ, editors. Proceedings of the 26th conference on Neural Information Processing Systems Curran Associates, Inc.; 2012. p. 1–9. [Google Scholar]

[B37] 37.Wintgens JN (Ed). Coffee: Growing, processing, sustainable production: A guidebook for growers, processors, traders, and researchers Weinheim (Germany): Wiley-VCH; 2004. [Google Scholar]

[B38] 38.Ying Y, Zhou D-X. Unregularized online learning algorithms with general loss functions. Appl Comput Harmon Anal. 2017;42(2):224–244. [Google Scholar]

[B39] 39.Thakur A, Thapar D, Rajan P, Nigam A. Deep metric learning for bioacoustic classification: Overcoming training data scarcity using dynamic triplet loss. J Acoust Soc Am. 2019;146(1):534–547. [DOI] [PubMed] [Google Scholar]

[B40] 40.Babenko A, Slesarev A, Chigorin A, Lempitsky V. Neural codes for image retrieval. arXiv. 2014. 10.48550/arXiv.1404.1777 [DOI]

[B41] 41.Wu F, Yang Z, Mo X, Wu Z, Tang W, Duan J, Zou X. Detection and counting of banana bunches by integrating deep learning and classic image-processing algorithms. Comput Electron Agric. 2023;209:107827. [Google Scholar]

[B42] 42.Ramos PJ, Prieto FA, Montoya EC, Oliveros CE. Automatic fruit count on coffee branches using computer vision. Comput Electron Agric. 2017;137:9–22. [Google Scholar]

[B43] 43.Farah A (Ed). Introduction to coffee plant and genetics London (UK): The Royal Society of Chemistry; 2019. [Google Scholar]

PERMALINK

Geographic-Scale Coffee Cherry Counting with Smartphones and Deep Learning

Juan Camilo Rivera Palacio

Christian Bunn

Eric Rahn

Daisy Little-Savage

Paul Günter Schmidt

Masahiro Ryo

Abstract

Introduction

Materials and Methods

Study site

Fig. 1.

Sampling campaign with local farmers

Sampling protocols with a pictured-based model and manual measurement

Images annotation

YOLO v8 model for cherry detection

Fig. 2.

Table 1.

Estimating the total number of cherries at the tree level from information about the number of cherries per branch

Results

Description of the coffee tree

Table 2.

Fig. 3.

YOLO v8’s performance for cherry detection

Fig. 4.

Table 3.

Model performance at branch level

Fig. 5.

Fig. 6.

Model performance at tree level

Fig. 7.

Discussion

Acknowledgments

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases