Skip to main content
[Preprint]. 2025 Aug 22:2024.11.02.621624. [Version 3] doi: 10.1101/2024.11.02.621624

Algorithm 1.

Focal CLIP Contrastive Loss

1: procedure FocalCliPLoss(I,T,γ,τ)
2: Input: Image embeddings I
3: Input: Text embeddings T
4: Input: Focusing parameter γ
5: Input: Learnable scaling parameter τ
6: II/I2,TT/T2 ▷ Normalize embeddings
7: SIT/τ ▷ Similarity matrix
8: Pimgsoftmax(S)
9: PtxtsoftmaxS
10: Y[0,1,,N-1] ▷ Groundtruth indices
11: procedure FocalLoss(P,Y)
12:   ceNLLLoss(P,Y)
13:   PtP[i,Y[i]] for all i ▷ Probabilities of all true classes
14:   loss1-Ptγce ▷ Focal-weighted loss
15:   return mean(loss)
16: end procedure
17: LimgFocalLossPimg
18: LtxtFocalLossPtxt
19: LLimg+Ltxt/2 ▷ Symmetric loss
20: return L
21: end procedure