Table 9.
Comparison of TinyML-based voice assistant models.
Model | Dataset | Accuracy (%) | Latency (ms) | Energy consumption |
---|---|---|---|---|
Transformer-based model | Common voice dataset (Multilingual) | 92.0 | 80–150 | High |
Hybrid CNN-RNN | CHiME Speech Dataset (Noisy Environments) | 89.5 | 100–250 | Moderate |
DNN | Gesture Dataset | 99.0 | 50–100 | High |
CNN | UrbanSound8K | 94.0 | 30–70 | Moderate |
RNN | AudioSet | 95.0 | 100–200 | High |
Decision Tree | ESC-50 | 90.0 | 10–30 | Low |
SVM | VoxForge dataset (Multilingual Speech) | 91.5 | 90–180 | Moderate |