Bidirectional Encoder Representations from Transformers (BERT)
BERT is a bidirectional Transformer. BERT is not a generative model. It’s an encoder only.
Bert tries to predict the masked token.
Resources
RoBERTA
This is just training BERT on more images.