Skip to main content
. 2026 Jan 21;16:5925. doi: 10.1038/s41598-026-36095-z

Table 1.

Cross-modal feature dimension comparison.

Modality type Original feature dimension Reduced feature dimension Semantic correlation analysis
Visual (video) 2048 × T (T = frames) 512 0.763 (with knowledge graph entities)
Textual description 768 (BERT-base) 512 0.821 (with knowledge graph entities)
Motion sequence 3 J × T (J = joints) 512 0.795 (with knowledge graph entities)
Integrated features 768 0.879 (with knowledge graph entities)