| Algorithm 1: Enhanced Multi-Modal Feature Fusion at scale (Equations (2)–(15)) |
|
Input: RGB image Sparse depth map Pre-processed dense depth (from BP) Camera intrinsics Output: Fused feature map Notion: : channel concatenation; : channel-wise multiplication; : global average pooling; : ReLU; : Sigmoid : dense depth pre−processed via Bilateral Propagation Procedure: 1: Step 1: Back-projection to 3D 2: Construct sparse point cloud . 3: Step 2: 2D feature encoding 4: Form 2D input 5: 6: Step 3: Dual-branch PointNet++-S 7: For each branch : 8: For each SA layer with config : 9: 10: End for 11: End for 12: Aggregate to multi-scale 3D feature 13: Step 4: Channel recalibration on 3D 14: ; 15: ; 16: . 17: Step 5: Spatial broadcasting of 3D feature 18: 19: Step 6: CAFFM: channel-attention fusion 20: ; 21: ; 22: ); 23: . 24: return . |