Mask Token
Ran into this from reading MAE and BERT. Essentially it’s just another token, similar to <BOS> / <EOS>
(see Transformer).
You can use it in your Vision Transformer for masking certain patches too.
Is the mask token shared? yes
[CLS], patch1, patch2, [MASK], patch4, [MASK], ...