Skip to main content
. Author manuscript; available in PMC: 2022 May 11.
Published in final edited form as: Proc IEEE Int Conf Comput Vis. 2021 Oct;2021:13557–13567. doi: 10.1109/iccv48922.2021.01332

Table 3:

Comparison of VidTr to other fast networks. We present the number of views used for evaluation and FLOPs required for each view. The latency denotes the total time required to get the reported top-1 score.1

Model Input Res. GFLOPs Latency(ms) top-1

TSM [37] 8fTSN 256 69 29 74.7
TEA [34] 16×4 256 70 - 76.1
3DEffi-B4 [18] 16×5 224 7 - 72.4
TEINet [39] 16×4 256 33 36 74.9
X3D-M [18] 16×5 224 5 40.9 74.6
X3D-L [18] 16×5 312 19 59.4 76.8

C-VidTr-S 8×8 224 39 17.5 75.7
C-VidTr-M 16×4 224 59 26.1 76.7