. 2022 Mar 7;22(5):2069. doi: 10.3390/s22052069

Table 4.

Image-based drowsiness detection systems.

Ref.	Image-Based Parameters	Extracted Features	Classification Method	Description	Quality Metric	Dataset
[16]	Mouth	Yawning	Cold and hot voxels [37]	A fatigue detection method based on yawning detection using thermal imaging. The cold and hot voxels were used to detect yawning.	Accuracy: Cold voxels: 71%, Hot voxels: 87%	Prepared their own dataset [36]
[38]	Respiration (using thermal camera)	Standard deviation and the mean of respiration rate, as well as the inspiration-to-expiration time ratio	SVM and KNN	Used facial thermal imaging to study the driver’s respiration and relate it to drowsiness.	Accuracy: SVM: 90%, KNN: 83% Sensitivity: SVM: 92%, KNN: 82% Precision: SVM: 91%, KNN: 90%	New thermal image dataset was prepared
[39]	Eye	Eyelids’ curvature	Classification based on the period of eye closure	Based on the eyelid’s curvature’s concavity, the system determined if the eye is opened or closed. Then, it detected drowsiness based on the eye closure period.	Accuracy: Dataset 1: 95%, Dataset 2: 70%, Dataset 3: >95%	Dataset1: Prepared their own image dataset Dataset2: Benchmark dataset [40] Dataset3: Prepared their own video dataset
[41]	Eye	Eye state (open/closed)	Proposed optical correlation with deformed filter	Used optical Vander Lugt correlator to precisely estimate the eye’s location in the Fourier plane of the Vander Lugt correlator.	Different accuracies for different datasets	FEI [44], ICPR [45], BioID [46], GI4E [47], and SHRP2 [48]
[49]	Eye	The eyes’ EAR value	Multilayer perceptron, RF, and SVM	Tracked eye blinking duration in video streams, as an indicator of drowsiness using the EAR. Overall, the SVM showed the best performance.	Accuracy: SVM: 94.9%	Prepared their own dataset
[30]	Face and eye	PERCLOS, blink frequency, and maximum closure duration of the eyes.	KNN, SVM, logistic regression, and ANN	A nonintrusive system based on face and eye state tracking. The final results revealed that the best models were the KNN and ANN.	Accuracy: KNN: 72.25% ANN: 71.61% Sensitivity: KNN: 83.33% ANN: 85.56%	NTHUDDD public dataset [35]
[50]	Eye	Eye closure	FD-NN, TL-VGG16, and TL-VGG19	Applied real-time system based on the area of eye closure using CNN. For eye closure classification, three networks were introduced: FD-NN, TL-VGG16, and TL-VGG19.	Accuracy: FD-NN: 98.15%, TL-VGG16: 95.45%, TL-VGG19: 95%	ZJU gallery and prepared their own dataset
[51]	Eye	34 eye–eye tracking features	RF and non-linear SVM	Used 34 eye-tracking signals’ features to detect drowsiness. These features were extracted from overlapping eye signals’ epochs of different lengths. The labels were extracted from EEG signals.	Accuracy: RF: 88.37% to 91.18% SVM: 77.1% to 82.62% Sensitivity for 10s epoch: RF: 88.1% SVM: 79.1%	Prepared their own dataset
[52]	Eye and Mouth	PERCLOS, eye closing duration, and average mouth opening time	Mamdani fuzzy inference system	The state of the extracted parameters is determined through a cascade of regression tree algorithms. A Mamdani fuzzy inference system then estimates the driver state.	Accuracy: 95.5% Precision: 93.3%	300-W dataset [53]
[54]	Eye and Mouth	Eye closure and mouth openness for a duration of time	Circular Hough transform	The circular Hough transform method is applied to check whether the mouth is open or iris is detected. Based on these two measures, the driver’s state is determined.	Accuracy: 94%	Prepared their own dataset
[55]	Eye and Head	Frequency of eyes blinking and frequency of head tilting	Templet matching to detect the eyes and calculating the frequency of head tilting and eye blinking to detect the drowsiness level	By calculating the frequency of head tilting and eye blinking, the drowsiness level is determined, on a scale of 0-100. If drowsiness reached 100, a loud audible warning would be triggered.	Accuracy: 99.59% Precision: 97.86%	Prepared their own dataset
[56]	Mouth and Eye	Proportion of the number of closed-eye frames to the total number of frames in 1min, continuous-time of eye closure, blinking frequency, and number of yawns in 1-min	For face tracking: multiple CNNs-kernelized correlation filters method For drowsiness detection: newly proposed algorithm	The multiple CNNs-kernelized correlation filters method is used for face tracking and to extract the image-based parameters. If found drowsy, the driver is alerted.	Accuracy: 92%	CelebA dataset [57], YawDD dataset [58], and new video data were prepared
[59]	Facial, hand, Behavioral (head, eyes, or mouth movements)	Facial expression, behavioral features, head gestures, and hand gestures	SoftMax classifier	This system introduced an architecture that uses four deep learning models to extract four different types of features.	Accuracy: 85% Sensitivity: 82% Precision: 86.3%	NTHUDDD public dataset [35]
[17]	Eye, head, and mouth	Eye closure duration, head nodding, and yawning	A two-stream CNN	Used multi-task cascaded CNNs to find the positions of the mouth and eyes. Then, it extracted the static and dynamic features from a partial facial image and partial facial optical flow, respectively. Lastly, it combined the features to classify the image data.	Accuracy: 97.06% Sensitivity: 96.74% Precision: 97.03%	NTHUDDD public dataset [35]
[63]	Eye, mouth, head, and scene conditions	Facial changes in eye, mouth, and head, illumination condition of driving, and wearing glasses	3D-deep CNN	The framework contained four models to recognize the drivers’ alertness status, using the condition-adaptive representation.	Accuracy: 76.2%	NTHUDDD public dataset [35]
[33]	Eye, head, and mouth	Blinking rate, head-nodding, and yawning frequency	Fisher score for feature selection and non-linear SVM for classification	The system is based on a hand-crafted compact face texture descriptor that can capture the most discriminant drowsy features.	Accuracy: 79.84%	NTHUDDD public dataset [35]
[18]	Facial features	Face feature vectors	SVM	Used facial motion information entropy, extracted from real-time videos. The algorithm contained four modules.	Accuracy: 94.32%	YawDD dataset [58]
[68]	Facial features, head movements	Implicitly decides the important features like eye closure, mouth position, chin or brow raises, frowning, and nose wrinkles	3D CNN	DDD was performed, based on activity prediction, through a depth-wise separable 3D CNN, using real-time face video. An advantage of this method was that it implicitly decided the important features, rather than pre-specifying a set of features beforehand.	Accuracy: 73.9%	NTHUDDD public dataset [35]
[69]	Eye and mouth	Temporal facial feature vectors formed from spatial features	LSTM	A method that applied real-time DDD, based on a combination of CNN and LSTM. It consisted of two parts: spatial and temporal.	Accuracy: 84.85%	NTHUDDD public dataset [35]
[70]	Eye, head, and mouth	Yawning, eye closure, and head nodding	Multi-layer model-based 3D convolutional networks	Used a repetitive neural network architecture, based on an RNN model, called multi-layer model-based 3D convolutional networks, to detect fatigue.	Accuracy: 97.3% Sensitivity: 92% Precision: 72%	NTHUDDD public dataset [35]
[72]	Eye and mouth	PERCLOS and mouth opening degree	Eye and mouth CNN	Applied face detection and feature points location, using multi-task cascaded CNNs architecture and EM-CNN to detect the mouth and eye state from the ROI.	Accuracy: 93.62% Sensitivity: 93.64%	Driving images dataset from Biteda company