Table 4.
Ref. | Image-Based Parameters |
Extracted Features | Classification Method | Description | Quality Metric | Dataset |
---|---|---|---|---|---|---|
[16] | Mouth | Yawning | Cold and hot voxels [37] | A fatigue detection method based on yawning detection using thermal imaging. The cold and hot voxels were used to detect yawning. | Accuracy: Cold voxels: 71%, Hot voxels: 87% |
Prepared their own dataset [36] |
[38] | Respiration (using thermal camera) | Standard deviation and the mean of respiration rate, as well as the inspiration-to-expiration time ratio | SVM and KNN | Used facial thermal imaging to study the driver’s respiration and relate it to drowsiness. | Accuracy: SVM: 90%, KNN: 83% Sensitivity: SVM: 92%, KNN: 82% Precision: SVM: 91%, KNN: 90% |
New thermal image dataset was prepared |
[39] | Eye | Eyelids’ curvature | Classification based on the period of eye closure | Based on the eyelid’s curvature’s concavity, the system determined if the eye is opened or closed. Then, it detected drowsiness based on the eye closure period. | Accuracy: Dataset 1: 95%, Dataset 2: 70%, Dataset 3: >95% |
Dataset1: Prepared their own image dataset Dataset2: Benchmark dataset [40] Dataset3: Prepared their own video dataset |
[41] | Eye | Eye state (open/closed) | Proposed optical correlation with deformed filter | Used optical Vander Lugt correlator to precisely estimate the eye’s location in the Fourier plane of the Vander Lugt correlator. | Different accuracies for different datasets | FEI [44], ICPR [45], BioID [46], GI4E [47], and SHRP2 [48] |
[49] | Eye | The eyes’ EAR value | Multilayer perceptron, RF, and SVM | Tracked eye blinking duration in video streams, as an indicator of drowsiness using the EAR. Overall, the SVM showed the best performance. | Accuracy: SVM: 94.9% |
Prepared their own dataset |
[30] | Face and eye | PERCLOS, blink frequency, and maximum closure duration of the eyes. | KNN, SVM, logistic regression, and ANN | A nonintrusive system based on face and eye state tracking. The final results revealed that the best models were the KNN and ANN. | Accuracy: KNN: 72.25% ANN: 71.61% Sensitivity: KNN: 83.33% ANN: 85.56% |
NTHUDDD public dataset [35] |
[50] | Eye | Eye closure | FD-NN, TL-VGG16, and TL-VGG19 | Applied real-time system based on the area of eye closure using CNN. For eye closure classification, three networks were introduced: FD-NN, TL-VGG16, and TL-VGG19. | Accuracy: FD-NN: 98.15%, TL-VGG16: 95.45%, TL-VGG19: 95% |
ZJU gallery and prepared their own dataset |
[51] | Eye | 34 eye–eye tracking features | RF and non-linear SVM | Used 34 eye-tracking signals’ features to detect drowsiness. These features were extracted from overlapping eye signals’ epochs of different lengths. The labels were extracted from EEG signals. | Accuracy: RF: 88.37% to 91.18% SVM: 77.1% to 82.62% Sensitivity for 10s epoch: RF: 88.1% SVM: 79.1% |
Prepared their own dataset |
[52] | Eye and Mouth | PERCLOS, eye closing duration, and average mouth opening time | Mamdani fuzzy inference system | The state of the extracted parameters is determined through a cascade of regression tree algorithms. A Mamdani fuzzy inference system then estimates the driver state. | Accuracy: 95.5% Precision: 93.3% |
300-W dataset [53] |
[54] | Eye and Mouth | Eye closure and mouth openness for a duration of time | Circular Hough transform | The circular Hough transform method is applied to check whether the mouth is open or iris is detected. Based on these two measures, the driver’s state is determined. | Accuracy: 94% | Prepared their own dataset |
[55] | Eye and Head | Frequency of eyes blinking and frequency of head tilting | Templet matching to detect the eyes and calculating the frequency of head tilting and eye blinking to detect the drowsiness level | By calculating the frequency of head tilting and eye blinking, the drowsiness level is determined, on a scale of 0-100. If drowsiness reached 100, a loud audible warning would be triggered. | Accuracy: 99.59% Precision: 97.86% |
Prepared their own dataset |
[56] | Mouth and Eye | Proportion of the number of closed-eye frames to the total number of frames in 1min, continuous-time of eye closure, blinking frequency, and number of yawns in 1-min | For face tracking: multiple CNNs-kernelized correlation filters method For drowsiness detection: newly proposed algorithm |
The multiple CNNs-kernelized correlation filters method is used for face tracking and to extract the image-based parameters. If found drowsy, the driver is alerted. | Accuracy: 92% | CelebA dataset [57], YawDD dataset [58], and new video data were prepared |
[59] | Facial, hand, Behavioral (head, eyes, or mouth movements) | Facial expression, behavioral features, head gestures, and hand gestures | SoftMax classifier | This system introduced an architecture that uses four deep learning models to extract four different types of features. | Accuracy: 85% Sensitivity: 82% Precision: 86.3% |
NTHUDDD public dataset [35] |
[17] | Eye, head, and mouth | Eye closure duration, head nodding, and yawning | A two-stream CNN | Used multi-task cascaded CNNs to find the positions of the mouth and eyes. Then, it extracted the static and dynamic features from a partial facial image and partial facial optical flow, respectively. Lastly, it combined the features to classify the image data. | Accuracy: 97.06% Sensitivity: 96.74% Precision: 97.03% |
NTHUDDD public dataset [35] |
[63] | Eye, mouth, head, and scene conditions | Facial changes in eye, mouth, and head, illumination condition of driving, and wearing glasses | 3D-deep CNN | The framework contained four models to recognize the drivers’ alertness status, using the condition-adaptive representation. | Accuracy: 76.2% | NTHUDDD public dataset [35] |
[33] | Eye, head, and mouth | Blinking rate, head-nodding, and yawning frequency | Fisher score for feature selection and non-linear SVM for classification | The system is based on a hand-crafted compact face texture descriptor that can capture the most discriminant drowsy features. | Accuracy: 79.84% | NTHUDDD public dataset [35] |
[18] | Facial features | Face feature vectors | SVM | Used facial motion information entropy, extracted from real-time videos. The algorithm contained four modules. | Accuracy: 94.32% | YawDD dataset [58] |
[68] | Facial features, head movements | Implicitly decides the important features like eye closure, mouth position, chin or brow raises, frowning, and nose wrinkles | 3D CNN | DDD was performed, based on activity prediction, through a depth-wise separable 3D CNN, using real-time face video. An advantage of this method was that it implicitly decided the important features, rather than pre-specifying a set of features beforehand. | Accuracy: 73.9% | NTHUDDD public dataset [35] |
[69] | Eye and mouth | Temporal facial feature vectors formed from spatial features | LSTM | A method that applied real-time DDD, based on a combination of CNN and LSTM. It consisted of two parts: spatial and temporal. | Accuracy: 84.85% | NTHUDDD public dataset [35] |
[70] | Eye, head, and mouth | Yawning, eye closure, and head nodding | Multi-layer model-based 3D convolutional networks | Used a repetitive neural network architecture, based on an RNN model, called multi-layer model-based 3D convolutional networks, to detect fatigue. | Accuracy: 97.3% Sensitivity: 92% Precision: 72% |
NTHUDDD public dataset [35] |
[72] | Eye and mouth | PERCLOS and mouth opening degree | Eye and mouth CNN | Applied face detection and feature points location, using multi-task cascaded CNNs architecture and EM-CNN to detect the mouth and eye state from the ROI. | Accuracy: 93.62% Sensitivity: 93.64% |
Driving images dataset from Biteda company |