TABLE 5.
Reference | User input-preprocessing | Segmentation | Feature extraction | Dimensionality reduction | Classification | Volume estimation | Datasets and performance |
---|---|---|---|---|---|---|---|
(32) | — | — | Locality-Constrained Linear Coding, deep features with DeCAF | Bag-of-Words | SVM | — | Dishes (deep features)ACC = 72.88% |
(6) | Drawing polygons | Manual segmentation using polygons | CNN | — | CNN | — | Food segmentation: UNIMIB 2016 Recall = 71.4%, Precision = 73.4%, F-measure = 72.4% Food classification: UNIMIB 2016 ACC = 78% |
(17) | — | Food localization using CNN | CNN | — | CNN | — |
Food segmentation:UEC-Food 256 Precision = 54.33% Recall = 50.86% Egocentric Food Precision = 17.38% Recall = 8.72% Food classification:UEC-Food 256 ACC = 63.16% Egocentric Food ACC = 90.90% |
(76) | — | — | CNN | — | SVM | — | PFID ACC = 70.13% |
(48) | Top-view photo including credit card | Color-pixel-based k-means clustering and GrabCut | — | CNN | Based on the size of the reference object | UEC-Food 100 ACC = 75% | |
(88) | — | — | — | — | CNN | — | Food-101 ACC = 88.28%UEC-Food 100 ACC = 81.45%UEC-Food 256 ACC = 76.17% |
(120) | — | — | CNN including semantics-aware features | — | CNN | — | Food-101 ACC = 72.11% |
(37) | — | — | CNN exploiting the joint relation between food and ingredient labels through multi-task learning | — | CNN | — | Food classification:VIREO Food-172 ACC = 82.06%UEC-Food 100 ACC = 82.12%Ingredients recognition:VIREO Food-172 F-measure = 67.17%UEC-Food 100 F-measure = 70.72% |
(77) | — | — | CNN | — | CNN | — | UEC-Food 100 ACC = 60.9% |
(121) | — | — | Covariances of convolutional layer feature maps of CNN | — | CNN | — | Food-101 ACC = 58.65% |
(122) | — | — | CNN | — | CNN | — | UEC-Food100 ACC = 76.3%Food-101 ACC = 77.4% |
(123) | — | — | Feature vector from ensemble of 3 CNNs | — | CNN | — | Food-101 ACC = 71.12% |
(124) | — | Food border defined by user with a circle | CNN | — | CNN | — | FooDD ACC = 94.11% |
(125) | — | — | CNN | — | Multi-Task Triplet Network | — | UEC-Food 256 MAP = 31.7% |
(110) | — | — | CNN | — | CNN (NutriNet) | — | UNIMIB 2016 ACC = 86.39% |
(126) | — | — | CNN | — | CNN | — | Food-101 ACC = 87.96%UEC-Food 100 ACC = 86.51%UEC-Food 256 ACC = 78.60% |
(127) | — | CNN | — | CNN | — | Food-101 ACC = 86.97% | |
(92) | — | — | CNN (DualNet) | — | Ensemble of CNNs | — | UEC-Food 100 ACC = 49.19% |
(39) | — | — | CNN | — | SVM | — | NTUA-Food 2017 ACC = 85.94% |
(128) | — | — | CNN | — | CNN (Inception V3) | — | Food-101 ACC = 81.65% |
(103) | CNN (Mask R-CNN) | CNN (ResNet50) | — | CNN | CNN (VolumeNet) | Food segmentation:MAP = 64.7%Food classification:Madima 2017 ACC = 93.33% | |
(129) | — | — | CNN | — | CNN (WISeR) | — | UEC-Food 100 ACC = 89.58%UEC-Food 256 ACC = 83.15%Food-101 ACC = 90.27% |
(130) | — | — | — | — | CNN | — | UNIMIB 2016 ACC = 77.5% |
(22) | — | — | — | — | CNN (VGG, ResNet-50, Wide ResNet-50, Inception V3) | — | Food classification:ChinaFood-100 ACC = 78.26% (Inception V3)Nutrients estimation:Protein, fiber, vitamin, calcium, and ironMAPE is approximately 65% |
(36) | — | — | CNN (VGG) | — | SVM | — | Instagram800K ACC = 72.8% |
(40) | Top view image and side view image, a coin as a fiducial marker | GrabCut and Faster R-CNN | CNN | — | CNN | — | ECUST Food Dataset Mean Error = ±20% |
(41) | — | — | CNN (ResNet) | — | CNN (ResNet) | — | Food524DB ACC = 69.52% |
(45) | — | — | Multi-task CNN (ResNet) | — | CNN (ResNet) | — | Eating Occasion Image to Food Energy ACC = 88.67% MAE = 56.82 (kcal) |
(46) | — | Faster R-CNN provides bounding boxes with a foodness score | CNN (DenseNet-121) | — | Multi-task CNN (DenseNet-121) | — | Food localization:UEC-Food 100 Precision = 82%, Recall = 86%, F-Measure = 84%UEC-Food 256 Precision = 94%, Recall = 88%, F-Measure = 91%VIPERFoodNet Precision = 79%, Recall = 64%, F-Measure = 71%Food classification:Food-101 ACC = 80%UPMC Food-101 ACC = 69%UEC-Food 100 ACC = 81%UEC-Food 256 ACC = 72%VIPERFoodNet ACC = 72% |
(22) | — | — | — | — | CNN (VGG, ResNet-50, Wide ResNet-50, Inception V3) | — | Food classification:ChinaFood-100 ACC = 78.26% (Inception V3)Nutrients estimation:Protein, fiber, vitamin, calcium, and ironMAPE is approximately 65% |
(36) | — | — | CNN (VGG) | — | SVM | — | Instagram800K ACC = 72.8% |
(40) | Top view image and side view image, a coin as a fiducial marker | GrabCut and Faster R-CNN | CNN | — | CNN | — | ECUST Food Dataset Mean Error = ±20% |
(41) | — | — | CNN (ResNet) | — | CNN (ResNet) | — | Food524DB ACC = 69.52% |
(45) | — | — | Multi-task CNN (ResNet) | — | CNN (ResNet) | — | Eating Occasion Image to Food Energy ACC = 88.67% MAE = 56.82 (kcal) |
(46) | — | Faster R-CNN provides bounding boxes with a foodness score | CNN (DenseNet-121) | — | Multi-task CNN (DenseNet-121) | — | Food localization:UEC-Food 100 Precision = 82%, Recall = 86%, F-Measure = 84%UEC-Food 256 Precision = 94%, Recall = 88%, F-Measure = 91%VIPERFoodNet Precision = 79%, Recall = 64%, F-Measure = 71%Food classification:Food-101 ACC = 80%UPMC Food-101 ACC = 69%UEC-Food 100 ACC = 81%UEC-Food 256 ACC = 72%VIPERFoodNet ACC = 72% |
(22) | — | — | — | — | CNN (VGG, ResNet-50, Wide ResNet-50, Inception V3) | — | Food classification:ChinaFood-100 ACC = 78.26% (Inception V3)Nutrients estimation:Protein, fiber, vitamin, calcium, and ironMAPE is approximately 65% |
(36) | — | — | CNN (VGG) | — | SVM | — | Instagram800K ACC = 72.8% |
(40) | Top view image and side view image, a coin as a fiducial marker | GrabCut and Faster R-CNN | CNN | — | CNN | — | ECUST Food Dataset Mean Error = ±20% |
(41) | — | — | CNN (ResNet) | — | CNN (ResNet) | — | Food524DB ACC = 69.52% |
(45) | — | — | Multi-task CNN (ResNet) | — | CNN (ResNet) | — | Eating Occasion Image to Food Energy ACC = 88.67% MAE = 56.82 (kcal) |
(46) | — | Faster R-CNN provides bounding boxes with a foodness score | CNN (DenseNet-121) | — | Multi-task CNN (DenseNet-121) | — | Food localization:UEC-Food 100 Precision = 82%, Recall = 86%, F-Measure = 84%UEC-Food 256 Precision = 94%, Recall = 88%, F-Measure = 91%VIPERFoodNet Precision = 79%, Recall = 64%, F-Measure = 71%Food classification:Food-101 ACC = 80%UPMC Food-101 ACC = 69%UEC-Food 100 ACC = 81%UEC-Food 256 ACC = 72%VIPERFoodNet ACC = 72% |
(93) | — | — | — | — | Ensemble of CNNs (VGG16, VGG19, GoogLeNet, ResNet, Inception V3) with 5 different combination rules (minimum, average, median, maximum, product) | — | Food-101 ACC = 84.28%UEC-Food 100 ACC = 84.52%UEC-Food 256 ACC = 77.20% |
(131) | — | — | Category and ingredient oriented feature extraction based on CNN (VGG-16); fusion of 2 different kinds of features | — | Adapted CNN | — | Food-101 ACC = 55.3%VIREO Food-172 ACC = 75.1%ChineseFoodNet ACC = 66.1% |
(97) | Two meal images from 2 different viewing angles, 90 and 75 degrees from the table's plane, or short video, fiducial marker | Automated segmentation based on Mask-RCNN. Semi-automatic segmentation based on region growing and merging algorithm | — | — | CNN (Inception V3) | — | Madima database segmentation results: Fmin = 83.9%, Fsum = 94.4%Madima database food recognition results ACC = 57.1% |
(99) | — | Multiple hypothesis segmentation: salient region detection, multi-scale segmentation and fast rejection | Color, texture, and local neighborhood pixel features | — | ANN | — | UNIMIB 2016 ACC = 95.9% |
(117) | — | — | — | — | Multi-relational graph convolutional network, termed mRGCN (ResNet-50) | — | VIREO Food-172 ACC = 24.2% of unseen ingredientsUEC-Food 100 ACC = 17.9% of unseen ingredients |
(116) | — | — | First approach with Bag-of-Words extracts texture (binary patterns), color, SURF, and geometry features | First approach Bag-of-Words | First approach ANN; second approach CNNs such as GoogLeNet, Inception-v3, ResNet101 | — | 16 Categories were selected from UEC-Food 256. ACC = 93% (ResNet) |
(132) | — | Faster R-CNN provides bounding boxes (maximum 5 per image) | — | — | CNN (VGGNet) | — | UEC-Food 100 MAP = 17.5%UEC-Food 256 MAP = 10.5%. |
(38) | — | — | — | — | CNN (GoogLeNet) | — | FOOD-5K ACC = 99.2%Food-11 ACC = 83.6%. |
(133) | — | — | First approach: feature extraction from AlexNet and VGG16 | — | First approach: SVM; second approach: fine-tuning CNN (ResNet50) | — | First approach: FOOD-5K ACC = 99.00%Food-11 ACC = 89.33%Food-101 ACC = 62.44%.Second approach:Food-101 ACC = 79.86% |
(134) | — | — | — | — | 5-Layer CNN | — | UEC-Food 100 ACC = 60.90% |
(100) | — | Canny edge detection, multi-scale segmentation, fast rejection of background pixels | Color, texture, SIFT, and SURF features | — | 3-Layer ANN | — | UNIMIB 2016 ACC = 94.5% |
(135) | Data augmentation by translations, rotations, shearing, zooming, and flipping | — | — | — | CNN (Inception-v3) | Ingredients and nutritional value estimation from vector space embeddings of words (text data from the internet) | Food-101 ACC = 80.0% |
(98) | — | Local variation segmentation | Color descriptors: SCD, DCD. Texture descriptors: EFD and GFD. Local descriptors: SIFT and MDSIFT. | — | Multi-kernel SVM | — | UNIMIB 2016 ACC = 93.9% |
(86) | — | — | Pretrained CNNs (GoogLeNet and ResNet152) | — | NB, SVM-RBF, SVM-Poly, ANN, RF | — | FOOD-5K (ResNet-152 and SVM-RBF) ACC = 98.8%Food-11 (ResNet-152 and ANN) ACC = 91.34%Food-101 (ResNet-152 and SVM-RBF) ACC = 64.98% |
(136) | — | Weakly supervised CNN model with a new pooling technique and incorporate a class activation map for graph-based segmentation (VGG-16) | — | — | CNN (VGG-16) | — | Food-101 ACC = 74.02% |
(137) | — | JSEG algorithm consists of color quantization and spatial segmentation | — | — | — | — | UNIMIB 2016F-measure = 58% |
ACC, accuracy; ANN, artificial neural network; CNN, convolutional neural network; DCD, dominant color descriptor; EFD, entropy-based categorization and fractal dimension estimation; GFD, Gabor-based image decomposition and fractal dimension estimation; MAP, mean average precision; MAPE, mean absolute percentage error; MDSIFT, multi-scale dense SIFT; NB, naive Bayes; PFID, Pittsburgh fast-food image dataset; Poly, polynomial; RBF, radial basis function; RF, random forest; SCD, scalable color descriptor; SIFT, scale-invariant feature transform; SURF, speeded up robust features; SVM, support vector machine.