Table 2.
The execution of all deep learning methods is timed on the test dataset of 255,701 samples.
| Model | # Params. | Batch | Hardware | Time (s) | Speedup |
|---|---|---|---|---|---|
| NUPACK 3 | N/A | N/A | 64-core VM | 372.59 | 1.00 |
| RoBERTa | 6.1M | 1024 | RTX 3090 | 388.44 ± 0.32 | 0.96 |
| RNN | 249K | 8192 | RTX 3090 | 15.87 ± 0.10 | 23.47 |
| 4096 | TPUv2 | 03.60 ± 0.11 | 103.50 | ||
| CNN | 2.8M | 512 | RTX 3090 | 23.84 ± 0.08 | 15.63 |
| 4096 | TPUv2 | 01.23 ± 0.17 | 301.74 | ||
| 470K | 512 | RTX 3090 | 09.01 ± 0.00 | 41.34 | |
| 4096 | TPUv2 | 01.28 ± 0.15 | 290.21 |
The text in bold corresponds to the best model according to the time/speedup.
The average execution time and the standard deviation are reported in seconds. Each deep learning method is run 10 times, after an initial warm-up run. The time elapsed to load the dataset into memory is not taken into account and the batch size was chosen to maximise inference time. All deep learning models use consumer hardware or openly-available hardware (the TPU platform is completely free to use).