Overview of the architecture of AOSLO-net. The MA segmentation is performed through the following steps: data preprocessing, deep neural network training and inference, postprocessing and output ensembling. (a) Pre-processing AOSLO images for training the AOSLO-net. We first created the perfusion map by computing the deviations from the AOSLO video. Moreover, we created another set of enhanced AOSLO images from the AOSLO video. These two sets of images are concatenated to generate a third set of two-channel MA images. Data augmentation was performed on the two-channel MA images to increase the diversity of the training set (see examples in Supplementary Fig. S2). (b) Detailed modular components of the AOSLO-net (i.e., preprocessed multimodality images) produced from (a) are fed into the AOSLO-net, which consists of an EfficientNet-b3 encoder (c) and a regular UNet decoder. We then perform postprocessing and ensembling on the output images of the AOSLO-net to generate the segmentation map. (c) Detailed architecture of the EfficientNet-b3, which works as the “encoder” in the proposed AOSLO-net.