Skip to main content
. 2024 Nov 30;24(23):7682. doi: 10.3390/s24237682

Figure 3.

Figure 3

Overview of the proposed RGB-PoseTransformer3D model. The input sequence consists of RGB and pose modalities, which are then projected into an asymmetric two-stage 3D CNN pathway and subjected to early-stage feature fusion. We use self-attention to model intermediate features for asymmetric two-stage 3D CNN pathways, with cross-modal information flow implemented via the global cross complementary block (GCCB) in the network.