Skip to main content
. 2025 Sep 4;25(17):5495. doi: 10.3390/s25175495
Algorithm 1: Enhanced Multi-Modal Feature Fusion at scale  s (Equations (2)–(15))
Input:
 RGB image IRH×W×3
 Sparse depth map DRH×W×1
 Pre-processed dense depth (from BP) DspreRH×W
 Camera intrinsics KR3×3
Output:
 Fused feature map FsRH×W×C
Notion:
: channel concatenation;
: channel-wise multiplication;
GAP: global average pooling;
δ(): ReLU;
σ(): Sigmoid
Dspre: dense depth pre−processed via Bilateral Propagation
Procedure:
1: Step 1: Back-projection to 3D
2:  Construct sparse point cloud P=BackProjectS,K.
3: Step 2: 2D feature encoding
4:  Form 2D input X=[IDspre]
5:  F2D=UNetX
6: Step 3: Dual-branch PointNet++-S
7:  For each branch blocal,context:
8:   For each SA layer with config (nl,rl,kl):
9:    glbPointNetSA(P;nl,rl,kl)
10:   End for
11:  End for
12:  Aggregate to multi-scale 3D feature F3D=ConcatFlocal2,Fcontext2
13: Step 4: Channel recalibration on 3D
14:  z=GAPF3D;
15:  s=σW2δW1z;
16:  G3D=sF3D.
17: Step 5: Spatial broadcasting of 3D feature
18:  G3D~=BroadcastG3DH×W
19: Step 6: CAFFM: channel-attention fusion
20:  Fin=ConcatF2D,G3D~;
21:  z=GAPFin;
22:  α=σW2δW1z);
23:  Fs=αFin.
24: return Fs.