Table 1.
First Author | Year | Journal | Methods | Study Setting | Device | Outcomes | Validity | Precision |
---|---|---|---|---|---|---|---|---|
Cadavid [33] |
2012 | Pers. Ubiquit. Comput. | Active Appearance Model (AAM) for face tracking, and spectral analysis on the temporal window of the model parameter values; binary support vector machine classifier for chewing events | Laboratory | Not reported; 37 videos at 24 fps; frame resolution: 640 × 480 | Chewing detection | Manual annotation for chewing events | 93% after cross-validation |
Okamoto [34] |
2014 | IEEE International Conference on Multimedia and Expo Workshops | Mouth detector limited to the lower part of detected face; Chopstick detection using OpenCV Hough transform for straight lines | Laboratory | Smartphone Google Nexus 5 (2.3 GHz Quad Core, Android 4.4), inner camera; frontal view | Food intake estimation | N.A. | N.A. |
Hantke [35] |
2018 | Proceedings of the 20th ACM International Conference on Multimodal Interaction | OpenFace facial landmarks extraction for tracking the mouth | Office room | Logitech HD Pro Webcam C920; 30 fps; resolution: 1280 × 720; frontal view | Food liking | Leave-One-Out Cross-Validation and SVM | Likability 0.583 |
Haider [36] |
2018 | Proceedings of the 20th ACM International Conference on Multimodal Interaction | OpenSMILE for facial landmarks extraction, coupled with OpenSMILE audio-feature extraction | Office room | Logitech HD Pro Webcam C920; 30 fps; resolution: 1280 × 720; frontal view | Food liking | Leave-One-Out Cross-Validation and active feature transformation | 0.61 |
Konstantinidis [37] |
2019 | Computer Vision Systems | OpenPose for mouth and hands tracking; Deep Network (3 Conv + shortcut, 3 Conv + shortcut, 3 LSTM) | Laboratory | 85 videos; Samsung digital camcorder; 1.5 m away from the subject; side view | Automatic bite detection | F-Score: 0.9173 | 0.9175 |
Qiu [38] |
2019 | IEEE 16th International Conference on Wearable and Implantable Body Sensor Networks | Mask R-CNN for 360-degree camera meal videos; Thresholds for assessing pixel intersection between hand-face and hand-food to infer eating events | Free-living (Indoor food sharing scenarios) | Samsung’s gear 360 camera; 1024 × 1024 pixels | Food intake estimation | N.A. | N.A. |
Hossain [39] |
2020 | IEEE Access | Face detection with manual selection of the region of interest; CNN for bite/non bite classification; Optical flow for spatial chewing motion at every pixel | Laboratory | 84 videos; SJCAM SJ4000 Action Camera; 1080p video at 30 fps; side view |
Automatic count of bites and chews | Manual annotation with 3-button system and LabView software (custom-made) | Bites: 88.9% ± 7.4%; Chews: 88.64% ± 5.29% |
Rouast and Adam [40] |
2020 | IEEE J. Biomed. Health Inform | CNN for hand-to-mouth movement in 360-degree meal videos | Free-living (Indoor group meal) | 102 videos; 360 fly-4 K camera; 24 fps | Intake gesture detection | N.A. | F1-score: 0.858 |
Konstantinidis [41] |
2020 | Nutrients | OpenPose skeletal and mouth features extracted for training the RABiD algorithm. Two stream data: 2D coordinates and distances from mouth corners, and from upper body | Laboratory | Samsung digital camcorder; 1.5 m away from the subject; side view; resolution: 576p (720 × 576 pixels) at 25 fps | Meal duration and bite counts | Manual annotation (Noldus Observer XT) | F1-score: 0.948 |
Nour [42] |
2021 | Advances in Social Sciences Research Journal | Facial landmarks (dlib) for tracking jawline movement; OpenPose for 2D pose estimation | N.A. | N.A. | Real-time eating activity tracking | Manual annotation | N.A. |
Park [43] |
2020 | Robotics and Autonomous System | Facial landmarks (dlib) for mouth-pose estimator; Algorithmic model for improving 3D estimation, location, and orientation of the mouth | Laboratory | Intel SR300 RGB-D camera | Robot active feeding assistance | Wrist-mounted camera | N.A. |
Alshboul [44] |
2021 | Sensors | Time series data consisting of Euclidean distance between jaw/mouth landmarks and a reference facial landmark | Free-living (outdoors, indoors, and public spaces) | 300 videos; Huawei Y7 Prime 2018 smartphone; 13 MP camera; resolution: 1080p at 30 fps; frontal view | Number of chews | Manual annotation (Intra-class correlation coefficient = slow: 0.96, normal: 0.94, fast: 0.91) | Avg Error ± SD: 5.42% ± 4.61 (slow chewing) 7.47% ± 6.85 (normal chewing) 9.84% ± 9.55 (fast chewing) |
Kato [45] |
2021 | Gerodontology | Video fluoroscopy of swallowing for determining which foods are more appropriate for elderly people | Laboratory | N.A. | Association between masticatory movements and food texture in older adults | N.A. | N.A. |