Schematic diagram of convolutional VAE, spatial transformer and COPIT based VAE. In a convolutional VAE, convolutional encoder is used to generate means and standard deviations, from which latent variables are sampled, and then output image is generated from sampled latent variables through convolutional decoder (A). In a spatial transformer, convolutional localization net is used to generate latent variables which are used as transformation coefficients for generating sampling grids, and then output image is calculated using a sampling grid and the standard image (B). In our new model, latent variables are generated in the same way as in the convolutional VAE, but sampled in similar way as in the spatial transformer but grid generator has been replaced with COPIT which two versions - six symmetrical and one asymmetrical. Output image is calculated from final sampling grid and standard image using linearized multi-sampling (C).