Table 4:

Paragraph level performance of the evidence retriever module. The overall evaluation metrics (precision, recall and F1-score) are macro-weighted. Evidence prediction is the main task whereas SA and SI prediction are auxiliary tasks and help the model align the vector representations of the paragraphs for the hospital-stay level suicidal behavior prediction.

Paragraph Evidence Prediction				Paragraph SA Prediction				Paragraph SI Prediction
Evidence	P	R	F	Labels	P	R	F	Labels	P	R	F
Yes	0.79	0.87	0.83	Positive	0.71	0.74	0.73	Positive	0.46	0.62	0.53
No	0.95	0.91	0.93	Neg_Unsure	0.19	0.26	0.22	Negative	0.38	0.46	0.42
-	-	-	-	Neutral-SA	0.95	0.92	0.93	Neutral-SI	0.98	0.99	0.98
Overall	0.87	0.89	0.88	Overall	0.62	0.64	0.63	Overall	0.61	0.69	0.64

P: Precision, R: Recall and F: F1-score.