Mask Token

Ran into this from reading MAE and BERT. Essentially it’s just another token, similar to <BOS> / <EOS> (see Transformer).

You can use it in your Vision Transformer for masking certain patches too.

Is the mask token shared? yes

[CLS], patch1, patch2, [MASK], patch4, [MASK], ...