Skip to main content
. 2025 May 23;27:e72229. doi: 10.2196/72229

Table 2.

Methodological details of included studies that classify UDa, BDb, and HCc.

Study Data recording Data used Data preprocessing Specific variable or feature selection Machine learning models or statistic test Validation Findings
Wearable devices

Anmella et al [35], 2023 2 d Physiological data with wearable devices (Empatica E4) Rules-based filter for invalid physiological data and time unit set to 1 second X, Y, or Z-axis acceleration, blood volume pulse, electrodermal activity, heart rate, and skin temperature BiLSTMd e 7-class classification task: ACCf=0.7; AUROCg=0.69; F1-score=0.6927

Zakariah and Alotaibi [36], 2023 5-20 d General levels of activity with a wearable Actiwatch Imputation techniques (mean imputation, median imputation, or regression-based imputation) for missing values; transformed categorical variables into numerical representations Motor activity measurement from the Actiwatch UMAPh and NNi Leave-one-out validation 4-class classification task: ACC=0.991; F1-score=0.9887
Audiovisual recordings

Yang et al [37], 2016 1 d Speech responses from 5 questions after participants watched 6 videos Silence removal and speech segmentation based on energy and spectral centroid as features for threshold definition Emotion profiles, 39 dimensions of Mel-frequency cepstral coefficients and acoustic features of 384 dimensions SVMj, MLPk, LSTMl, and BiLSTM 13-fold cross-validation 3-class classification task: optimal ACC=76.92%

Su et al [38], 2017 1 d Facial expressions elicited by 6 emotional video clips Select a time interval and segment each facial image into 12 mutually independent facial regions 8 basic orientations of motion vector in microscopic facial expression HMMm and LSTM 12-fold cross-validation 3-class classification task: optimal ACC=67.7%

Hong et al [39], 2018 1 d Facial expressions elicited by 6 emotional video clips Select time interval and facial points were aligned to a new coordinate 12 action units MLP, SVM, GMMn, and LSTM 12-fold cross-validation 3-class classification task: optimal ACC=61.1%

Huang et al [40], 2019 1 d Speech responses from interviews with a clinician after participants watched 6 videos Use of hierarchical spectral clustering algorithm for database adaptation Emotion profiles and 32-dimensional acoustic features SVM, CNNo, and LSTM Leave-one cross-validation 3-class classification task: optimal ACC=75.56%

Su et al [41], 2020 1 d Facial expressions and speech responses from interviews with a clinician after participants watched 6 videos Hierarchical spectral clustering and denoising autoencoder method for database adaptation Emotion profiles, action units, 384 acoustic features and 49 facial expression feature points SVM, HMM, MLP, GRUp, CNN, RNNq, and LSTM 13-fold cross-validation 3-class classification task: optimal ACC=76.9%

Hong et al [42], 2021 1 d Facial expressions elicited by 6 emotional video clips Selection of four 4-second intervals per elicitation video based on the facial expression intensity of all participants Action units for macroscopic facial expressions and motion vectors for microscopic facial expressions MLP, NN, and LSTM 12-fold cross-validation 3-class classification task: optimal ACC=72.2%

Luo et al [43], 2024 1 d Voice signals collected from 7 pieces of reading material Power normalization and speech segmentation into 7 parts corresponding to the 7 reading materials 120 vocal features for classification, such as the mean value of root-mean-square energy DTr, NBs, SVM, KNNt, ELu, and CNN 3-class classification task: optimal ACC=95.6%
Multimodal technology

Wu et al [44], 2024 1 d Text, audio, facial attributes, heart rate, and eye movement with mobile devices while participants have a conversation with a virtual assistant Word embedding; 5 spectral features, facial attribute embedding, 23 heart rate variability indices, and 7 eye movement features (fixation and saccade) RFv, LSTM, and DT 5-fold cross-validation 5-class classification task: optimal ACC=90.26%

aUD: unipolar depression.

bBD: bipolar disorder.

cHC: healthy control.

dBiLSTM: bidirectional long short-term memory.

eNot available.

fACC: accuracy.

gAUROC: area under the receiver operating characteristic.

hUMAP: uniform manifold approximation and projection.

iNN: neural network.

jSVM: support vector machine.

kMLP: multilayer perceptron.

lLSTM: long short-term memory.

mHMM: hidden Markov model.

nGMM: Gaussian mixture model.

oCNN: convolutional neural network.

pGRU: gated recurrent unit.

qRNN: recurrent neural network.

rDT: decision tree.

sNB: naive Bayes.

tKNN: k-nearest neighbor.

uEL: ensemble learning.

vRF: random forest.