Sigmoid Loss for Language Image Pre-Training (SigLIP)
SigLIP is an improved version of CLIP which introduces sigmoid-based Contrastive Loss instead of the traditional softmax-based contrastive loss used in CLIP.
It is implemented with a Vision Transformer, and then use a Contrastive Loss.
Resources