Skip to main content
. 2022 Mar 7;22(5):2069. doi: 10.3390/s22052069

Table 4.

Image-based drowsiness detection systems.

Ref. Image-Based
Parameters
Extracted Features Classification Method Description Quality Metric Dataset
[16] Mouth Yawning Cold and hot voxels [37] A fatigue detection method based on yawning detection using thermal imaging. The cold and hot voxels were used to detect yawning. Accuracy:
Cold voxels: 71%,
Hot voxels: 87%
Prepared their own dataset [36]
[38] Respiration (using thermal camera) Standard deviation and the mean of respiration rate, as well as the inspiration-to-expiration time ratio SVM and KNN Used facial thermal imaging to study the driver’s respiration and relate it to drowsiness. Accuracy:
SVM: 90%,
KNN: 83%
Sensitivity:
SVM: 92%, KNN: 82%
Precision: SVM: 91%, KNN: 90%
New thermal image dataset was prepared
[39] Eye Eyelids’ curvature Classification based on the period of eye closure Based on the eyelid’s curvature’s concavity, the system determined if the eye is opened or closed. Then, it detected drowsiness based on the eye closure period. Accuracy:
Dataset 1: 95%,
Dataset 2: 70%,
Dataset 3: >95%
Dataset1: Prepared their own image dataset
Dataset2: Benchmark dataset [40]
Dataset3: Prepared their own video dataset
[41] Eye Eye state (open/closed) Proposed optical correlation with deformed filter Used optical Vander Lugt correlator to precisely estimate the eye’s location in the Fourier plane of the Vander Lugt correlator. Different accuracies for different datasets FEI [44], ICPR [45], BioID [46], GI4E [47], and SHRP2 [48]
[49] Eye The eyes’ EAR value Multilayer perceptron, RF, and SVM Tracked eye blinking duration in video streams, as an indicator of drowsiness using the EAR. Overall, the SVM showed the best performance. Accuracy:
SVM: 94.9%
Prepared their own dataset
[30] Face and eye PERCLOS, blink frequency, and maximum closure duration of the eyes. KNN, SVM, logistic regression, and ANN A nonintrusive system based on face and eye state tracking. The final results revealed that the best models were the KNN and ANN. Accuracy:
KNN: 72.25%
ANN: 71.61%
Sensitivity:
KNN: 83.33%
ANN: 85.56%
NTHUDDD public dataset [35]
[50] Eye Eye closure FD-NN, TL-VGG16, and TL-VGG19 Applied real-time system based on the area of eye closure using CNN. For eye closure classification, three networks were introduced: FD-NN, TL-VGG16, and TL-VGG19. Accuracy:
FD-NN: 98.15%,
TL-VGG16: 95.45%, TL-VGG19: 95%
ZJU gallery and prepared their own dataset
[51] Eye 34 eye–eye tracking features RF and non-linear SVM Used 34 eye-tracking signals’ features to detect drowsiness. These features were extracted from overlapping eye signals’ epochs of different lengths. The labels were extracted from EEG signals. Accuracy:
RF: 88.37% to 91.18%
SVM: 77.1% to 82.62%
Sensitivity for 10s epoch:
RF: 88.1%
SVM: 79.1%
Prepared their own dataset
[52] Eye and Mouth PERCLOS, eye closing duration, and average mouth opening time Mamdani fuzzy inference system The state of the extracted parameters is determined through a cascade of regression tree algorithms. A Mamdani fuzzy inference system then estimates the driver state. Accuracy: 95.5%
Precision: 93.3%
300-W dataset [53]
[54] Eye and Mouth Eye closure and mouth openness for a duration of time Circular Hough transform The circular Hough transform method is applied to check whether the mouth is open or iris is detected. Based on these two measures, the driver’s state is determined. Accuracy: 94% Prepared their own dataset
[55] Eye and Head Frequency of eyes blinking and frequency of head tilting Templet matching to detect the eyes and calculating the frequency of head tilting and eye blinking to detect the drowsiness level By calculating the frequency of head tilting and eye blinking, the drowsiness level is determined, on a scale of 0-100. If drowsiness reached 100, a loud audible warning would be triggered. Accuracy: 99.59%
Precision: 97.86%
Prepared their own dataset
[56] Mouth and Eye Proportion of the number of closed-eye frames to the total number of frames in 1min, continuous-time of eye closure, blinking frequency, and number of yawns in 1-min For face tracking: multiple CNNs-kernelized correlation filters method
For drowsiness detection: newly proposed algorithm
The multiple CNNs-kernelized correlation filters method is used for face tracking and to extract the image-based parameters. If found drowsy, the driver is alerted. Accuracy: 92% CelebA dataset [57], YawDD dataset [58], and new video data were prepared
[59] Facial, hand, Behavioral (head, eyes, or mouth movements) Facial expression, behavioral features, head gestures, and hand gestures SoftMax classifier This system introduced an architecture that uses four deep learning models to extract four different types of features. Accuracy: 85%
Sensitivity: 82%
Precision: 86.3%
NTHUDDD public dataset [35]
[17] Eye, head, and mouth Eye closure duration, head nodding, and yawning A two-stream CNN Used multi-task cascaded CNNs to find the positions of the mouth and eyes. Then, it extracted the static and dynamic features from a partial facial image and partial facial optical flow, respectively. Lastly, it combined the features to classify the image data. Accuracy: 97.06%
Sensitivity: 96.74%
Precision: 97.03%
NTHUDDD public dataset [35]
[63] Eye, mouth, head, and scene conditions Facial changes in eye, mouth, and head, illumination condition of driving, and wearing glasses 3D-deep CNN The framework contained four models to recognize the drivers’ alertness status, using the condition-adaptive representation. Accuracy: 76.2% NTHUDDD public dataset [35]
[33] Eye, head, and mouth Blinking rate, head-nodding, and yawning frequency Fisher score for feature selection and non-linear SVM for classification The system is based on a hand-crafted compact face texture descriptor that can capture the most discriminant drowsy features. Accuracy: 79.84% NTHUDDD public dataset [35]
[18] Facial features Face feature vectors SVM Used facial motion information entropy, extracted from real-time videos. The algorithm contained four modules. Accuracy: 94.32% YawDD dataset [58]
[68] Facial features, head movements Implicitly decides the important features like eye closure, mouth position, chin or brow raises, frowning, and nose wrinkles 3D CNN DDD was performed, based on activity prediction, through a depth-wise separable 3D CNN, using real-time face video. An advantage of this method was that it implicitly decided the important features, rather than pre-specifying a set of features beforehand. Accuracy: 73.9% NTHUDDD public dataset [35]
[69] Eye and mouth Temporal facial feature vectors formed from spatial features LSTM A method that applied real-time DDD, based on a combination of CNN and LSTM. It consisted of two parts: spatial and temporal. Accuracy: 84.85% NTHUDDD public dataset [35]
[70] Eye, head, and mouth Yawning, eye closure, and head nodding Multi-layer model-based 3D convolutional networks Used a repetitive neural network architecture, based on an RNN model, called multi-layer model-based 3D convolutional networks, to detect fatigue. Accuracy: 97.3%
Sensitivity: 92%
Precision: 72%
NTHUDDD public dataset [35]
[72] Eye and mouth PERCLOS and mouth opening degree Eye and mouth CNN Applied face detection and feature points location, using multi-task cascaded CNNs architecture and EM-CNN to detect the mouth and eye state from the ROI. Accuracy: 93.62%
Sensitivity: 93.64%
Driving images dataset from Biteda company