Algorithm 1.
Focal CLIP Contrastive Loss
| 1: | procedure | |
| 2: | Input: Image embeddings | |
| 3: | Input: Text embeddings | |
| 4: | Input: Focusing parameter | |
| 5: | Input: Learnable scaling parameter | |
| 6: | ▷ Normalize embeddings | |
| 7: | ▷ Similarity matrix | |
| 8: | ||
| 9: | ||
| 10: | ▷ Groundtruth indices | |
| 11: | procedure | |
| 12: | ||
| 13: | for all | ▷ Probabilities of all true classes |
| 14: | ▷ Focal-weighted loss | |
| 15: | return mean(loss) | |
| 16: | end procedure | |
| 17: | ||
| 18: | ||
| 19: | ▷ Symmetric loss | |
| 20: | return | |
| 21: | end procedure |