Table 6.
Overview of pipeline parallelism with 2 GPU cards.
| With pipeline parallelism | ||||||||
|---|---|---|---|---|---|---|---|---|
| GPU-0 | FP-Batch 1 | FP-Batch 2 | Idle | BP-Batch 1 | FP-Batch 3 | BP-Batch 2 | FP-Batch 4 | BP-Batch 3 |
| GPU-1 | Idle | FP-Batch 1 | BP-Batch 1 | FP-Batch 2 | BP-Batch 2 | FP-Batch 3 | BP-Batch 3 | FP-Batch 4 |
| Master Thread | Store random initial weights of encoder model and outputs from GPU-0 for FP-Batch 1 | Store random initial weights of encoder model and outputs from GPU-0 for FP-Batch 2 | No operations | Retrieve initial weights data and outputs associating to encoder model for use in BP-Batch 1 and FP-Batch 2 respectively | Erase initial weights data and outputs associating to encoder model from Batch 1 runs Store initial weights of encoder model and outputs from GPU-0 for FP-Batch 3 |
Retrieve initial weights data and outputs associating to encoder model for use in BP-Batch 2 and FP-Batch 3 respectively | Erase initial weights data and outputs associating to encoder model from Batch 2 runs Store initial weights of encoder model and outputs from GPU-0 for FP-Batch 4 |
Retrieve initial weights data and outputs associating to encoder model for BP-Batch 3 and FP-Batch 4 respectively |
| Without pipeline parallelism | ||||||||
| GPU-0 | FP-Batch 1 | Idle | Idle | BP-Batch 1 | FP-Batch 1 | Idle | Idle | BP-Batch 2 |
| GPU-1 | Idle | FP-Batch 1 | BP-Batch 1 | Idle | Idle | FP-Batch 2 | BP-Batch 2 | Idle |
* FP represents forward-pass (e.g., FP-Batch 1 means forward pass for mini-batch 1).
** BP represents back-propagation (e.g., FP-Batch 1 means backpropagation for mini-batch.