A Simple Framework for Contrastive Learning of Visual Representations

How do they prevent mode collapse?