Skip to main content
. 2024 Nov 15;26:e51432. doi: 10.2196/51432

Table 1.

Overview of notable food identification systems, classifier algorithms, selected features, number of classes, name of food dataset (if specified or noted as their own dataset if absent), and accuracy resultsa.

Study, year Projects or team Classifier Feature Class (dataset) Accuracy results percentages
Shroff et al [18], 2008
  • DiaWear

  • Neural network

  • Color, size, shape, and texture

  • 4

  • ~75

Chen et al [19], 2009
  • PFIDb

  • SVMc

  • Color

  • BoSIFTd

  • 61 (PFID)

  • ~11

  • ~24

Taichi and Keiji [21], 2009
  • UECe

  • MKLf

  • Color, texture, and SIFTg

  • 50

  • 61.34

Hoashi et al [52], 2010
  • UEC

  • MKL

  • BoFh, Gabori, color, HOGj, and texture

  • 85

  • 62.53

Yang et al [20], 2010
  • PFID

  • SVM

  • Pairwise local features

  • 61 (PFID)

  • 78.00

Zhu et al [31], 2010
  • TADAk

  • SVM with Gaussian radial basis kernel

  • Color and texture

  • 19

  • 97.20

Kong and Tan [53], 2011
  • DietCam

  • Multiclass SVM

  • Nearest neighbor Gaussian region detector, and SIFT

  • 61 (PFID)

  • 84.00

Bosch et al [22], 2011
  • TADA

  • SVM

  • Color, entropy, Gabor, Tamural, SIFT, Haar waveletm, steerablen, and DAISYo

  • 39

  • 86.10

Matsuda et al [36], 2012
  • UEC

  • MKL-SVM

  • HOG, SIFT, Gabor, color, and texture

  • 100 (UEC-Food100)

  • 55.80

Anthimopoulos et al [23], 2014
  • GoCARB

  • SVM

  • HSVp-SIFT, optimized BoF, and color moment invariant

  • 11

  • 78.00

He et al [54], 2014
  • TADA

  • k-nearest neighbors

  • DCDq, SIFT, MDSIFTr, and SCDs

  • 42

  • 65.4

Pouladzadeh et al [32], 2014 t
  • SVM

  • Color, texture, size, and shape

  • 15

  • 90.41

Kawano and Yanai [35], 2014
  • UEC

  • Pretrained CNNu

  • 100 (UEC-Food100)

  • 72.3

Yanai and Kawano [39], 2015
  • UEC

  • Deep CNN

  • 100 (UEC-Food-100)

  • 78.77

Christodoulidis et al [40], 2015
  • GoCARB

  • Patch-wise CNN

  • 7

  • 84.90

Myers et al [27], 2015
  • Google

  • GoogLeNet

  • 101

  • 79.00

Liu et al [41], 2016
  • DeepFood

  • Food-101

  • UEC-256

  • 77.40

  • 54.70

Singla et al [42], 2016
  • GoogLeNet

  • 11

  • 83.60

Hassannejad et al [43], 2016
  • InceptionV3v

  • 101 (Food-101)

  • 100 (UEC-Food100)

  • 256 (UEC-Food256)

  • 88.28

  • 81.45

  • 76.17

Ciocca et al [44], 2017
  • VGGw

  • 73 (UNIMINB2016)

  • 78.30

Mezgec and Koroušić Seljak [45], 2017
  • NutriNet (Modified AlexNetx)

  • 73 (UNIMINB2016)

  • 86.72

Pandey et al [55], 2017
  • Ensemble net

  • 101 (Food-101)

  • 72.10

Martinel et al [56], 2018
  • WISeRy

  • 101 (Food-101)

  • 100 (UEC-Food100)

  • 256 (UEC-Food256)

  • 88.72

  • 79.76

  • 86.71

Jiang et al [57], 2020
  • MSMVFAz

  • 101 (Food-101)

  • 172 (VireoFood-172)

  • 208 (ChineseFoodNet)

  • ~90.47

  • 90.61

  • 81.94

Lu et al [58], 2020
  • GoCARB

  • Modified InceptionV3

  • 298 Generic food

  • Subgroups

  • Fine-grained

  • (MADiMAaa)

  • 65.80

  • 61.50

  • 57.10

Wu et al [59], 2021
  • Modified AlexNet

  • 22 styles of Bento sets

  • 96.30

aNote that convolutional neural network–based classifiers do not require the number of features to be shown as they extract features autonomously.

bPFID: Pittsburgh Fast-Food Image Dataset.

cSVM: support vector machine.

dBoSIFT: bag-of-scale-invariant feature transform.

eUEC: University of Electro-Communications.

fMKL: multiple kernel learning. This is a machine-learning technique that combines multiple kernels or similarity functions, to improve the performance and flexibility of kernel-based models such as support vector machines.

gSIFT: scale-invariant feature transform.

hBoF: bag-of-features.

iGabor is a texture feature extraction invented by Dennis Gabor.

jHOG: histogram of orientated gradients—a feature descriptor based on color.

kTADA: Technology Assisted Dietary Assessment.

lTamura is a 6-texture feature extraction invented by Hideyuki Tamura.

mHaar wavelet is a mathematical analysis for wavelet sequence named after Alfréd Haar.

nSteerable filter is an image filter introduced by Freeman and Adelson.

oDAISY is a local image descriptor introduced by E Tola et al [60], but they did not describe a true acronym of DAISY.

pHSV is the name of a red-green-blue color model based on hue, saturation, and value.

qDCD: dominant color descriptor.

rMDSIFT: multiscale dense scale-invariant feature transform.

sSCD: scalable color descriptor.

tNot available.

uCNN: convolutional neural network.

vInception is an object detection model that won the ImageNet Challenge in 2014, recognized for its use of a novel architecture that efficiently leverages computing resources inside the network.

wVGG: visual geometry group—an object detection model named after a research group from the University of Oxford.

xAlexNet is an object detection model that won the ImageNet Large-Scale Visual Recognition Challenge (also known as the ImageNet challenge) in 2012; it is named after its inventors, Alex Krizhevsky.

yWISeR: wide-slice residual.

zMSMVFA: multi-scale multi-view feature aggregation.

aaMADiMA: Multimedia Assisted Dietary Management.