. 2023 Feb 27;33(8):5728–5739. doi: 10.1007/s00330-023-09478-3

Table 3.

Qualitative exploration and number of trainable parameters of deep learning architectures for DSA video

Model	Potential advantages	Potential limitations
2D CNN	Simple to use, useful when region of interest is unique to a single frame. Leverages large-scale ImageNet dataset	Lacks temporal dependency
3D CNN	Captures full spatial and temporal dependency, useful when # of frames are large. Leverages large-scale Kinetics video dataset	Typically larger in model size than an equivalent 2D model due to the added kernel dimension
Stacked 2D CNN model	Simple to use (with limited frames), easy to interpret DSA over a few individual frames. Leverages ImageNet pretraining on 2D feature extractors	Feature-level temporal dependency only. There are no joint spatial and temporal dependencies
2D vision transformer	Robust to frame-level distortions, relaxed inductive bias	2D features only Limited training dataset size may limit the final performance