Embedding
An embedding is a mapping from Tokens to vectors.
Um, it seems like the initial embedding is just a lookup table from the token to an embedding.
This is in the first pass, before we can apply attention.
So the attention block compensates for the different meanings that words can have.
The embedding visualized:
The vector representation depends on the context in which the token appears if the model uses contextual embeddings (like BERT, GPT).
After the process of tokenization, say a word "model" is tokenized, will it always be the same vector?
Depends on the kind of embedding that you use.
- Static embeddings (like Word2Vec, GloVe): The word “model” will have the same vector every time, regardless of context.
- Contextual embeddings (like BERT, GPT): The word “model” will have different vectors depending on the surrounding words. For example: - “This is a model airplane.” → Vector A - “She is a fashion model.” → Vector B
Even though “model” will get the same token ID from the tokenizer, the output vector from the model will vary in context-based models.