. 2022 Apr 8;5:46. doi: 10.1038/s41746-022-00589-7

Table 3.

The deep learning methods for mental illness detection.

Type	Method	Description
CNN-based methods	Standard CNN^122–127	Standard CNN structure: convolutional layer, pooling layer and fully connected layer. Some studies also incorporate other textual features (like POS, LIWC, BoW, etc.).
	Multi-Gated LeakyReLU CNN (MGL-CNN)¹²⁸	Two hierarchical (post-level and user-level)neural network models with gated units and convolutional networks.
	Graph model combined with Convolutional Neural Network¹²⁹	A unified hybrid model combining CNN with factor graph model which leverages social interactions and content.
RNN-based methods	LSTM or GRU (some with attention mechanism)^{32,133,136,232–234}	Standard RNN structure: Long Short-Term Memory networks(LSTM) or Gate Recurrent Unit(GRU), and some studies add attention mechanism.
	Hierarchical Attention Network (HAN) with GRU¹³⁸	The GRU with a word-level attention layer and a sentence-level attention layer.
	LSTM with transfer learning^140,141	Using transfer learning on open dataset for model pre-training.
	LSTM or GRU with multi-task learning^{142,235–237}	Using multi-task learning to help illness detection get better result. The tasks include multi-risky behaviors classification, severity score prediction,word vector classification,and sentiment classification.
	LSTM or GRU with reinforcement learning^143,144	Using reinforcement learning to automatically select the important posts.
	LSTM or GRU with multiple instance learning^145,146	Using multiple instance learning to get the possibility of post-level labels and improve the prediction of user-level labels.
	SISMO¹³⁹	An ordinal hierarchical LSTM attention model
Transformer-based methods	Self-attention models^148,149	Using the encoder structure of transformer which has self-attention module.
	BERT-based models (BERT^150,151, DistilBERT¹⁵², RoBERTa¹⁵³, ALBERT¹⁵⁰, BioClinical BERT³¹, XLNET¹⁵⁴, GPT-1¹⁵⁵)	Different BERT-based pre-trained models.
Hybrid-based methods	LSTM+CNN^156–160	Combining LSTM with CNN to extract local features and sequence features.
	STATENet (using transformer and LSTM)¹⁶¹	A time-aware transformer combining emotional and historical information.
	Sub-emotion network^164,165,238	Integrating Bag-of-Sub-Emotion embeddings into LSTM to get emotional information.
	Events and Personality traits for Stress Prediction (EPSP) model²³⁹	A joint memory network for learning the dynamics of user’s emotions and personality.
	PHASE¹⁶⁶	A time and phase-aware model that learns historical emotional features from users.
	Hyperbolic graph convolution networks¹⁶⁷	Hyperbolic Graph Convolutions with the Hawkes process to learn the historical emotional spectrum of a user.