Skip to main content
. 2021 Nov 3;34(6):1414–1423. doi: 10.1007/s10278-021-00530-6

Fig. 4.

Fig. 4

Illustration of the proposed method. Left: a deep convolutional neural network which takes image patches of 256×256 pixels as inputs, denoted as floc. Right: the aggregation network, fagg, that takes the concatenation of three types of maps as inputs: 1) location indicator map, I, in gray, generated by downscaling the binary mask indicating the cropping window’s location, 2) saliency maps, Sm and Sb, in red and green, generated by the context network, based on Globally Aware Multiple Instance Classifier [16], 3) embedding map, E, in blue, formed by the representation h, produced by floc