Table 7. Validation results for deep learning models on the 110 K review dataset.
| Method | Accuracy | Std. dev. | Class | Precision | Recall | F1-score |
|---|---|---|---|---|---|---|
| RNN | 0.957 | 0.0006 | Positive | 0.981 | 0.982 | 0.982 |
| Neutral/Mixed | 0.751 | 0.773 | 0.762 | |||
| Negative | 0.930 | 0.907 | 0.919 | |||
| GRU | 0.958 | 0.0002 | Positive | 0.978 | 0.985 | 0.981 |
| Neutral/Mixed | 0.774 | 0.751 | 0.763 | |||
| Negative | 0.930 | 0.907 | 0.918 | |||
| LSTM | 0.958 | 0.0009 | Positive | 0.979 | 0.984 | 0.982 |
| Neutral/Mixed | 0.766 | 0.754 | 0.760 | |||
| Negative | 0.931 | 0.909 | 0.920 | |||
| BERT | 0.975 | 0.0001 | Positive | 0.989 | 0.990 | 0.989 |
| Neutral/Mixed | 0.854 | 0.846 | 0.850 | |||
| Negative | 0.954 | 0.951 | 0.952 |