Table 2.
Overall performances of RFs, gpt-3.5-turbo, and BERT models.
| Random forests | gpt-3.5-turbo | BERT | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Models | Categories | P | R | F | P | R | F | P | R | F |
| One-step (end-to-end) | Diet | 0.910 | 0.919 | 0.915 | 0.793 | 0.970 | 0.873 | 0.981 | 0.966 | 0.973 |
| Exercise | 0.946 | 0.906 | 0.926 | 0.711 | 1.000 | 0.831 | 0.953 | 0.979 | 0.965 | |
| Mental health | 0.916 | 0.784 | 0.844 | 0.485 | 0.979 | 0.648 | 0.951 | 0.943 | 0.945 | |
| Health literacy | 0.914 | 0.740 | 0.818 | 0.867 | 0.980 | 0.920 | 0.932 | 0.914 | 0.921 | |
| Unrelated | 0.902 | 0.977 | 0.938 | 1.000 | 0.599 | 0.749 | 0.970 | 0.954 | 0.962 | |
| Macro | 0.917 | 0.865 | 0.888 | 0.771 | 0.906 | 0.804 | 0.957 | 0.951 | 0.953 | |
| Two-step | Diet | 0.876 | 0.928 | 0.899 | 0.932 | 0.970 | 0.951 | 0.918 | 0.971 | 0.943 |
| Exercise | 0.890 | 0.938 | 0.909 | 0.795 | 0.969 | 0.873 | 0.910 | 0.953 | 0.927 | |
| Mental health | 0.832 | 0.889 | 0.856 | 0.688 | 0.979 | 0.808 | 0.888 | 0.897 | 0.892 | |
| Health literacy | 0.884 | 0.820 | 0.849 | 1.000 | 0.840 | 0.913 | 0.923 | 0.873 | 0.895 | |
| Unrelated | 0.957 | 0.937 | 0.947 | 0.966 | 0.854 | 0.906 | 0.957 | 0.937 | 0.947 | |
| Macro | 0.888 | 0.902 | 0.892 | 0.876 | 0.922 | 0.890 | 0.919 | 0.926 | 0.921 | |
P, Precision; R, Recall; F, F1-score.
The macro-averaged values are italicized and the highest F1-scores are underlined.