. 2025 May 23;27:e72229. doi: 10.2196/72229

Table 2.

Methodological details of included studies that classify UD^a, BD^b, and HC^c.

Study			Data recording		Data used		Data preprocessing		Specific variable or feature selection		Machine learning models or statistic test		Validation		Findings
Wearable devices
	Anmella et al [35], 2023	2 d		Physiological data with wearable devices (Empatica E4)		Rules-based filter for invalid physiological data and time unit set to 1 second		X, Y, or Z-axis acceleration, blood volume pulse, electrodermal activity, heart rate, and skin temperature		BiLSTM^d		—^e		7-class classification task: ACC^f=0.7; AUROC^g=0.69; F₁-score=0.6927
	Zakariah and Alotaibi [36], 2023	5-20 d		General levels of activity with a wearable Actiwatch		Imputation techniques (mean imputation, median imputation, or regression-based imputation) for missing values; transformed categorical variables into numerical representations		Motor activity measurement from the Actiwatch		UMAP^h and NNⁱ		Leave-one-out validation		4-class classification task: ACC=0.991; F₁-score=0.9887
Audiovisual recordings
	Yang et al [37], 2016	1 d		Speech responses from 5 questions after participants watched 6 videos		Silence removal and speech segmentation based on energy and spectral centroid as features for threshold definition		Emotion profiles, 39 dimensions of Mel-frequency cepstral coefficients and acoustic features of 384 dimensions		SVM^j, MLP^k, LSTM^l, and BiLSTM		13-fold cross-validation		3-class classification task: optimal ACC=76.92%
	Su et al [38], 2017	1 d		Facial expressions elicited by 6 emotional video clips		Select a time interval and segment each facial image into 12 mutually independent facial regions		8 basic orientations of motion vector in microscopic facial expression		HMM^m and LSTM		12-fold cross-validation		3-class classification task: optimal ACC=67.7%
	Hong et al [39], 2018	1 d		Facial expressions elicited by 6 emotional video clips		Select time interval and facial points were aligned to a new coordinate		12 action units		MLP, SVM, GMMⁿ, and LSTM		12-fold cross-validation		3-class classification task: optimal ACC=61.1%
	Huang et al [40], 2019	1 d		Speech responses from interviews with a clinician after participants watched 6 videos		Use of hierarchical spectral clustering algorithm for database adaptation		Emotion profiles and 32-dimensional acoustic features		SVM, CNN^o, and LSTM		Leave-one cross-validation		3-class classification task: optimal ACC=75.56%
	Su et al [41], 2020	1 d		Facial expressions and speech responses from interviews with a clinician after participants watched 6 videos		Hierarchical spectral clustering and denoising autoencoder method for database adaptation		Emotion profiles, action units, 384 acoustic features and 49 facial expression feature points		SVM, HMM, MLP, GRU^p, CNN, RNN^q, and LSTM		13-fold cross-validation		3-class classification task: optimal ACC=76.9%
	Hong et al [42], 2021	1 d		Facial expressions elicited by 6 emotional video clips		Selection of four 4-second intervals per elicitation video based on the facial expression intensity of all participants		Action units for macroscopic facial expressions and motion vectors for microscopic facial expressions		MLP, NN, and LSTM		12-fold cross-validation		3-class classification task: optimal ACC=72.2%
	Luo et al [43], 2024	1 d		Voice signals collected from 7 pieces of reading material		Power normalization and speech segmentation into 7 parts corresponding to the 7 reading materials		120 vocal features for classification, such as the mean value of root-mean-square energy		DT^r, NB^s, SVM, KNN^t, EL^u, and CNN		—		3-class classification task: optimal ACC=95.6%
Multimodal technology
	Wu et al [44], 2024	1 d		Text, audio, facial attributes, heart rate, and eye movement with mobile devices while participants have a conversation with a virtual assistant		—		Word embedding; 5 spectral features, facial attribute embedding, 23 heart rate variability indices, and 7 eye movement features (fixation and saccade)		RF^v, LSTM, and DT		5-fold cross-validation		5-class classification task: optimal ACC=90.26%

^aUD: unipolar depression.

^bBD: bipolar disorder.

^cHC: healthy control.

^dBiLSTM: bidirectional long short-term memory.

^eNot available.

^fACC: accuracy.

^gAUROC: area under the receiver operating characteristic.

^hUMAP: uniform manifold approximation and projection.

ⁱNN: neural network.

^jSVM: support vector machine.

^kMLP: multilayer perceptron.

^lLSTM: long short-term memory.

^mHMM: hidden Markov model.

ⁿGMM: Gaussian mixture model.

^oCNN: convolutional neural network.

^pGRU: gated recurrent unit.

^qRNN: recurrent neural network.

^rDT: decision tree.

^sNB: naive Bayes.

^tKNN: k-nearest neighbor.

^uEL: ensemble learning.

^vRF: random forest.