Figure 1.
The overview of the proposed method: the image I is fed to the network f(·); the encoder E(·) allows to extract the latent representation z(·) used for classification tasks {ŷt}; the decoder D(·) is trained to reconstruct the image I from its latent representation z(·) leading to the auxiliary output Î.
