Skip to main content
. Author manuscript; available in PMC: 2021 Feb 1.
Published in final edited form as: Med Image Anal. 2019 Nov 7;60:101592. doi: 10.1016/j.media.2019.101592

Figure 1:

Figure 1:

Given an image with size and resolution as illustrated in (a), where the grid represents the voxel lattice, (b)-(e) illustrate techniques for subsampling the image in order to reduce the memory requirement for training a ConvNet on a GPU. (b) downsampling reduces the number of voxels by combining intensity values of multiple voxels, thereby decreasing the image resolution. In this example, a downsampling factor of four is used for each dimension, i.e., the size of the image is reduced from 163 to 43 voxels. If the downsampling factor sufficiently large, the memory requirement can be reduced enough to utilize the full image extent during training on a GPU. (c)-(e) cropping uses the original image resolution, however, only a portion of the image voxels are extracted as denoted by the regions that have gridlines in (c)-(e). (c) the slab has full extent in two dimensions, but limited extent in the third dimension. (d) the 3D patch has limited extent in all three spatial dimensions. (e) the 2D slice has full extent in two spatial dimensions, however, no 3D context is available. Note, (b) and (d) representations are both 43 voxels, and thus require the same memory, however, (b) has global extent with low resolution, whereas (d) has local extent with high resolution. The text below each figure describes the relative image resolution, spatial extent, and required memory for each scenario.