FIGURE 2.
Pot detection model and post-processing. Input images were scaled to 416 × 416 pixels and fed into the YOLO v3 model, resulting in a series of predicted boxes. Semantic filtering was performed to extract the valid boxes and assign the local variety and plant indices. C indicates tensor splicing operation, N = 3 represents the output detection channels at different scales, ‘pot’ represents the detected pot, ‘C-’ is the variety index, and ‘P-’ represents the plant index of the individual variety.