Given an image with size and resolution as illustrated in (a), where the
grid represents the voxel lattice, (b)-(e) illustrate techniques for subsampling
the image in order to reduce the memory requirement for training a ConvNet on a
GPU. (b) downsampling reduces the number of voxels by combining intensity values
of multiple voxels, thereby decreasing the image resolution. In this example, a
downsampling factor of four is used for each dimension, i.e., the size of the
image is reduced from 163 to 43 voxels. If the
downsampling factor sufficiently large, the memory requirement can be reduced
enough to utilize the full image extent during training on a GPU. (c)-(e)
cropping uses the original image resolution, however, only a portion of the
image voxels are extracted as denoted by the regions that have gridlines in
(c)-(e). (c) the slab has full extent in two dimensions, but limited extent in
the third dimension. (d) the 3D patch has limited extent in all three spatial
dimensions. (e) the 2D slice has full extent in two spatial dimensions, however,
no 3D context is available. Note, (b) and (d) representations are both
43 voxels, and thus require the same memory, however, (b) has
global extent with low resolution, whereas (d) has local extent with high
resolution. The text below each figure describes the relative image resolution,
spatial extent, and required memory for each scenario.