Bahador et al. [23] |
2021 |
Two scenarios: 1. data from three days of wristband device use form a single person, and 2. Open data set of 10 individuals performing 186 activities (mobility, eating, personal hygiene, and housework) |
Develop a data fusion technique to achieve a more comprehensive insight of human activity dynamics. Authors considered statistical dependency of multisensory data and exploring intramodality correlation patters for different activities. |
Sensor array with temperature, interbeat intervals, dermal activity, photoplethysmography, heart rate (1st dataset). Wristband 9 axis inertial measurement units (2nd dataset) |
Deep residual network. |
Doulah et al. [24] |
2021 |
30 volunteers using the system for 24 h in pseudo-free-living and 24 in a free-living environment |
Food intake detection, sensor fusion classifier (accelerometer and flex sensor). Image sensor was used to capture data every 15 s and validate sensor fusion decision. |
5 mp camera glasses add-on, accelerometer and flex sensor in contact with temporalis muscle |
SVM model. |
Heydarian et al. [25] |
2021 |
OREBA dataset [26], composed by OREBA-DIS with 100 participants consuming food in discrete portions and OREBA-SHA with 102 participants while consuming a communal dish |
Data fusion for automatic food intake gesture detection |
Although no sensors were used, dataset was obtained through video and inertial sensors data |
Fusion of inertial and video data with several methods that use deep learning. |
Kyritsis et al. [27] |
2021 |
FIC [28], FreeFIC [29], and FreeFIC held-out datasets containing triaxial acceleration and orientation velocity signals |
A complete Framework towards automated modeling of in-meal eating behavior and temporal localization of meals |
Data from smartwatch either worn on right or left wrist—accelerometer and gyroscope |
CNN for feature extraction and LSTM network to model temporal evolution. Both parts are jointly trained by minimizing a single loss function. |
Lee [30] |
2021 |
8 participants in noisy environments |
Detect eating events and calculate calorie intake |
Ultrasonic doppler shifts to detect chewing events and a camera placed on user’s neck |
Markov hidden model recognizer to maximize swallow detection accuracy. Relation between chewing counts and amount of food through a linear regression model. CNN to recognize food items. |
Mamud et al. [31] |
2021 |
Not specified, students were used with emphasis on acoustic signal |
Develop a Body Area Network for automatic dietary monitoring system to detect food type and volume, nutritional benefit and eating behavior |
Camera on chest with system hub, phones with added microphone and dedicated hardware to capture chewing and swallowing sounds, wrist-worn band with accelerometer and gyroscope |
Emphasis was given to the hardware system and the captured signals, but not on signal processing itself. |
Mirtchouk and Kleinberg [32] |
2021 |
6 subjects for 6 h in a total of 59 h of data |
Gain insight on dietary activity, namely chews per minute and causes for food choices |
Custom earbud with 2 microphones—one in-ear and one external |
SVDKL uses a deep neural network and multiple Gaussian Processes, one per feature, to do multiclass classification. |
Rouast and Adam [33] |
2021 |
Two datasets of annotated intake gestures—OREBA [26] and Clemson University |
A single stage approach which directly decodes the probabilities learned from sensor data into sparse intake detection—eating and drinking |
Video and inertial data |
Deep neural network with weakly supervised training using Connectionist Temporal Classification loss and decoding using an extended prefix beam search decoding algorithm. |
Fuchs et al. [34] |
2020 |
10,035 labeled product image instances created by the authors |
Detection of diet related activities to support health food choices |
Mixed reality headset-mounted cameras |
Comparison of several neural networks were performed based on object detection and classification accuracy. |
Heremans et al. [35] |
2020 |
16 subjects for training, and 37 healthy control subjects and 73 patients with functional dyspepsia for testing |
Automatic food intake detection through dynamic analysis of heart rate variability |
Electrocardiogram |
ANN with leave-one-out. |
Hossain et al. [36] |
2020 |
15,343 images (2127 food images and 13,216 not food images) |
Target and classify images as food/not food |
Wearable egocentric camera |
CNN based image classifier in a Cortex M7 microcontroller. |
Rachakonda et al. [37] |
2020 |
1000 images obtained from copyright-free sources—800 used for training and 200 for testing |
Focus on eating behavior of users, detect normal eating and stress eating, create awareness about its food intake behaviors |
Camera mounted on glasses |
Machine learning models to automatically classify the food from the plate, automatic object detection from plate, and automatic calorie quantification. |
Sundarramurthi et al. [38] |
2020 |
Food101 dataset [39] (101,000 images with 101 food categories) |
Develop a GUI-based interactive tool |
Mobile device camera |
Convolutional Neural Network for food image classification and detection. |
Ye et al. [40] |
2020 |
COCO2017 dataset [41] |
A method for food smart recognition and automatic dietary assessment on a mobile device |
Mobile device camera |
Mask R-CNN. |
Farooq et al. [42] |
2019 |
40 participants |
Create an automatic ingestion monitor |
Automatic ingestion monitor—hand gesture sensor used on the dominant hand, piezoelectric strain sensor, and a data collection module |
Neural network classifier. |
Johnson et al. [43] |
2019 |
25 min of data divided into 30 s segments, while eating, shaving, and brushing teeth |
Development of a wearable sensor system for detection of food consumption |
Two wireless battery-powered sensor assemblies, each with sensors on the wrist and upper arm. Each unit has 9-axis inertial measurement units with accelerometer, magnetometer, and gyroscope |
Machine learning to reduce false positive eating detection after the use of a Kalman filter to detect position of hand relative to the mouth. |
Konstantinidis et al. [44] |
2019 |
85 videos with people eating from a side view |
Detect food bite instances accurately, robustly, and automatically |
Cameras to capture body and face motion videos |
Deep network to extract human motion features from video sequences. A two-steam deep network is proposed to process body and face motion, together with the data form the first deep network to take advantage of both types of features simultaneously. |
Kumari et al. [45] |
2019 |
30 diabetic persons to confirm glucose levels with a glucometer |
Regulate glycemic index through calculation of food size, chewing style and swallow time |
Acoustic sensor in trachea using MEMS technology |
Deep belief network with Belief Net and Restricted Boltzmann Machine combined. |
Park et al. [46] |
2019 |
4000 food images by taking pictures of dishes in restaurants and Internet search |
Develop Korean food image detection and recognition model for use in mobile devices for accurate estimation of dietary intake |
Camera |
Training with TensorFlow machine learning framework with a batch size of 64. Authors present a deep convolutional neural network—K-foodNet. |
Qiu et al. [47] |
2019 |
360 videos and COCO dataset to train mask R-CNN |
Dietary intake on shared food scenarios—detection of subject’s face, hands and food |
Video camera (Samsung gear 360) |
Mask R-CNN to detect food class, bounding box indicating the location and segmentation mask of each food item. Predicted food masks could presumably be used to calculate food volume. |
Raju et al. [48] |
2019 |
Two datasets (food and no food) with 1600 images each |
Minimization of number of images needed to be processed either by human or computer vision algorithm for food image analysis |
Automatic Ingestion Monitor 2.0 with camera mounted on glasses frame |
Image processing techniques—lens barrel distortion, image sharpness analysis, and face detection and blurring. |
Turan et al. [49] |
2018 |
O participants, 4 male and 4 female, 22–29 years old |
Detection of ingestion sounds, namely swallowing and chewing |
Throat microphone with IC recorder |
Captured sounds are transformed into spectrograms using short-time Fourier transforms and use Convolutional Neural network for food intake classification problem. |
Wan et al. [50] |
2018 |
300 types of Chinese food and 101 kinds of western food from food-101 |
Identify the ingredients of the food to determine if diet is healthy |
Digital camera |
p-faster R-CNN based on Faster-CNN with Zeiler and Fergus model and Caffe network. |
Lee [51] |
2017 |
10 participants with 6 types of food |
Food intake monitoring, estimating the processes of chewing and swallowing |
Acoustic Doppler sonar |
Analysis of the jaw and its vibration pattern depending on type of food, feature extraction and classification with an Artificial Neural Network. |
Nguyen et al. [52] |
2017 |
10 participants in a lab environment |
Calculate the number of swallows in food intake to calculate caloric values |
Wearable necklace with piezoelectric sensors, accelerometer, gyroscope and magnetometer |
A recurrent neural network framework, named SwallowNet, detects swallows on continuous data steam after being trained with raw data using automated feature learning methods. |
Papapanagiotou et al. [53] |
2017 |
60 h semi-free living dataset |
Design a convolutional neural network for chewing detection |
In-ear microphone |
1-dimensional convolutional neural network. Authors also present results from leave-one-subject-out with fusion+ (acoustic and inertial sensors) |
Farooq et al. [54] |
2016 |
120 meals, 4 visits of 30 participants, from which 104 meals were analyzed |
Automatic measurement of chewing count and chewing rate |
Piezoelectric sensor to capture lower jaw motion |
ANN machine learning to classify epochs as chewing or not chewing. Epochs were derived from sensor data processing. |
Farooq et al. [55] |
2014 |
30 subjects (5 were left out) in a 4-visit experiment |
Automatic detection of food intake |
Electroglottograph, PS3Eye camera and miniature throat microphone |
Three-layer feed-forward neural network trained by the back propagation algorithm, neural network toolbox of Matlab. |
Dong et al. [56] |
2013 |
3 subjects, one female and two males |
Development of a system for wireless and wearable diet monitoring system to detect solid and liquid swallow events based on breathing cycles |
Piezoelectric respiratory belt |
Machine learning for feature extraction and selection. |
Pouladzadeh et al. [57] |
2013 |
Over 200 images of food, 100 for training set and another 100 for testing set |
Measurement and record of food calorie intake |
Built-in camera of mobile device |
Image processing using color segmentation, k-means clustering and texture segmentation to separate food items. Food portion identification through SVM and calorific value of food using nutritional table. |