. 2021 Aug 5;75:103231. doi: 10.1016/j.scs.2021.103231

Table 6.

Overview of pipeline parallelism with 2 GPU cards.

With pipeline parallelism
GPU-0	FP-Batch 1	FP-Batch 2	Idle	BP-Batch 1	FP-Batch 3	BP-Batch 2	FP-Batch 4	BP-Batch 3
GPU-1	Idle	FP-Batch 1	BP-Batch 1	FP-Batch 2	BP-Batch 2	FP-Batch 3	BP-Batch 3	FP-Batch 4
Master Thread	Store random initial weights of encoder model and outputs from GPU-0 for FP-Batch 1	Store random initial weights of encoder model and outputs from GPU-0 for FP-Batch 2	No operations	Retrieve initial weights data and outputs associating to encoder model for use in BP-Batch 1 and FP-Batch 2 respectively	Erase initial weights data and outputs associating to encoder model from Batch 1 runs Store initial weights of encoder model and outputs from GPU-0 for FP-Batch 3	Retrieve initial weights data and outputs associating to encoder model for use in BP-Batch 2 and FP-Batch 3 respectively	Erase initial weights data and outputs associating to encoder model from Batch 2 runs Store initial weights of encoder model and outputs from GPU-0 for FP-Batch 4	Retrieve initial weights data and outputs associating to encoder model for BP-Batch 3 and FP-Batch 4 respectively
Without pipeline parallelism
GPU-0	FP-Batch 1	Idle	Idle	BP-Batch 1	FP-Batch 1	Idle	Idle	BP-Batch 2
GPU-1	Idle	FP-Batch 1	BP-Batch 1	Idle	Idle	FP-Batch 2	BP-Batch 2	Idle

* FP represents forward-pass (e.g., FP-Batch 1 means forward pass for mini-batch 1).

** BP represents back-propagation (e.g., FP-Batch 1 means backpropagation for mini-batch.