. 2022 May 4;17(5):e0267901. doi: 10.1371/journal.pone.0267901

Table 4. Performance score.

a. Sentence task
	Precision	Recall	f₁ score
LSTM
Original	0.28	0.20	0.23
Balanced	0.10	0.96	0.19
Under-sampling	0.41	0.33	0.37
Bi-LSTM
Original	0.35	0.33	0.34
Balanced	0.15	0.86	0.26
Under-sampling	0.33	0.46	0.38
BERT
Original	0.43	0.23	0.30
Balanced	0.03	0.56	0.07
Under-sampling	0.45	0.66	0.54
b. User task
	Precision	Recall	f₁ score
LSTM
Original	0.66	0.52	0.58
Balanced	0.23	1.00	0.37
Under-sampling	0.57	0.57	0.57
Bi-LSTM
Original	0.65	0.68	0.66
Balanced	0.30	1.00	0.46
Under-sampling	0.50	0.68	0.57
BERT
Original	0.53	0.36	0.43
Balanced	0.13	0.93	0.23
Under-sampling	0.63	0.82	0.71

Precision, recall and f₁ scores are shown in these tables for the sentence task (a) and the user task (b). NLP deep-learning models used for this study are LSTM, Bi-LSTM and BERT. The percentage of positive data in the training dataset is approx. 2.7% for “Original” (the same ratio as in the original population), 50% for “Balanced” and 5% for “Under-sampling”.