Visualization of the three worst (left) and three best (right) segmentations in the held-out test dataset (scans from untrained sites). For each figure, left panel is the raw T2-FLAIR image, middle panel shows the target FIRST segmentation in red, and right panel shows the resulting DeepGRAI segmentation. Dice scores for the worst cases were 85.9%, 86.6%, and 87.0% from top to bottom, and Dice scores for the best cases were 96.2%, 96.2%, and 96.3% from top to bottom. Average Dice across all 459 test cases was 93.6%. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)