Skip to main content
. 2021 Jul 14;3(5):e200197. doi: 10.1148/ryai.2021200197

Figure 3:

The backbone convolutional neural network developed for landmark detection has a U-Net structure. More layers can be inserted in both the downsampling branch and the upsampling branch, and more blocks can be inserted into each layer. The output layer outputs the per-pixel scores, which go through softmax function. For the landmark detection on long-axis images, data from three views were used together to train one model. As shown in the input, every minibatch was assembled by using randomly selected images from three views and was used for back propagation. A total of four layers with three or four blocks per layer were used in this experiment. The output tensor shapes were reported by using the format [B, C, H, W], where B is the size of the minibatch, and C is the number of channels, and H and W are the image height and width. Input images have one channel for image intensity, and the output has four channels for three landmarks and the background. The illustration for outputs plots three color-coded landmark channels and omits the background channel.

The backbone convolutional neural network developed for landmark detection has a U-Net structure. More layers can be inserted in both the downsampling branch and the upsampling branch, and more blocks can be inserted into each layer. The output layer outputs the per-pixel scores, which go through softmax function. For the landmark detection on long-axis images, data from three views were used together to train one model. As shown in the input, every minibatch was assembled by using randomly selected images from three views and was used for back propagation. A total of four layers with three or four blocks per layer were used in this experiment. The output tensor shapes were reported by using the format [B, C, H, W], where B is the size of the minibatch, and C is the number of channels, and H and W are the image height and width. Input images have one channel for image intensity, and the output has four channels for three landmarks and the background. The illustration for outputs plots three color-coded landmark channels and omits the background channel.