Skip to main content
. Author manuscript; available in PMC: 2021 Nov 5.
Published in final edited form as: Proc AAAI Conf Artif Intell. 2021 May 18;35(16):14138–14148.

Table 1:

Memory consumption and running time results on various input sequence length. We report the average memory consumption (MB) and running time (ms) for one input instance with different input length through self-attention module.

self-attention input sequence length n
512 2048 8192
memory (MB) time (ms) memory (MB) time (ms) memory (MB) time (ms)
Transformer 54 (1×) 0.8 (1×) 685 (1×) 10.0 (1×) 10233 (1×) 155.4 (1×)
Linformer-256 41 (1.3×) 0.7 (1.1×) 165 (4.2×) 2.7 (3.6×) 635 (16.1×) 11.3 (13.8×)
Longformer-257 32.2 (1.7×) 2.4 (0.3×) 130 (5.3×) 9.2 (1.0×) 455 (22.5×) 36.2 (4.3×)
Nyströmformer-64 35 (1.5×) 0.7 (1.1 ×) 118 (5.8×) 2.7 (3.6×) 450 (22.8×) 12.3 (12.7×)
Nyströmformer-32 26 (2.1×) 0.6 (1.2×) 96 (7.1×) 2.6 (3.7×) 383 (26.7×) 11.5 (13.4×)