Figure 2. Convolutional Neural Network (CNN) architecture.

The model input is a series of 25 PARs, where each of PAR has 15 z-slices and has its height of 128 and width of 128. The model output consists of 4 CPs (corresponding to CP1 to CP4 in the B-spline model) for each of 6 motion variables. Conv3D = 3D convolution layer. CP = control point.