Summary of segmentation methods used for the food-segmentation task in image-based food-recognition systems. (A) Initial image of plate with user's meal. (B) Manual segmentation of initial image. The user draws a line/border/polygon manually around each food item. (C) Hierarchical segmentation of initial image. Hierarchical segmentation starts with an initial over-segmentation, where almost every pixel defines a different region, and gradually constructs finer segmented regions based on a specific criterion. (D) Saliency-aware segmentation of initial image. Saliency-aware segmentation uses spatial, color, and statistical features of food areas to enhance food regions and suppress nonfood regions. (E) Thresholding segmentation of initial image. A binary image is created where all pixels with color intensity above the predefined threshold are depicted with 1 color (e.g., white) and could indicate the background area, and all pixels below the threshold are depicted with another color and could indicate the food area. (F) Clustering segmentation of initial image. Pixels of food items are grouped into groups/clusters depicted by different color (e.g., 4 groups/colors are created in this example). (G) Segmentation of initial image based on Sobel operator. Edges of food items can be estimated by applying the Sobel operator to every pixel of the image (i.e., convolving the matrix on the left with the respective 3 × 3 matrix of the image for every pixel). After the convolution, areas in the image where the color intensity of the pixels change rapidly denote the border of a food item. (H) Color/texture-based segmentation of initial image. Color/texture-based segmentation assumes that regions of pixels that share similar color/texture properties in the image correspond to meaningful objects. The first cluster depicts the plate and the background. (I) Color/texture-based segmentation of initial image. The second cluster depicts the sauce. (J) Color/texture-based segmentation of initial image. The third cluster depicts the spaghetti. (K) Thermal clustering of initial image. Dynamic thermal thresholding can be applied for the discrimination of food from the plate and the background, since food is hotter than the other elements of the image. (L) Region-based segmentation of initial image. Starting points (seeds) of different areas are depicted with dots. Then, the algorithm expands the initial areas around the starting seeds with pixels in their neighborhood that fulfill a criterion based on a homogeneity metric. (M) Segmentation based on CNNs on initial image. CNNs were used for food localization by identifying the pixels that might belong to a food item. A binary image is created where all pixels that are categorized by the CNN as a food item are depicted with white color, whereas all the remaining pixels that are categorized as background are depicted with black color. CNN, convolutional neural network.