Skip to main content
. 2023 Feb 27;33(8):5728–5739. doi: 10.1007/s00330-023-09478-3

Table 3.

Qualitative exploration and number of trainable parameters of deep learning architectures for DSA video

Model Potential advantages Potential limitations
2D CNN Simple to use, useful when region of interest is unique to a single frame. Leverages large-scale ImageNet dataset Lacks temporal dependency
3D CNN Captures full spatial and temporal dependency, useful when # of frames are large. Leverages large-scale Kinetics video dataset Typically larger in model size than an equivalent 2D model due to the added kernel dimension
Stacked 2D CNN model Simple to use (with limited frames), easy to interpret DSA over a few individual frames. Leverages ImageNet pretraining on 2D feature extractors Feature-level temporal dependency only. There are no joint spatial and temporal dependencies
2D vision transformer Robust to frame-level distortions, relaxed inductive bias

2D features only

Limited training dataset size may limit the final performance