N-Gram
The goal is to generate text. We can use a probabilistic model.
Andrej Karpathy was the first to introduce me this idea.
Resources
Example bigram: P(βI saw a vanβ) = P(βIβ) x P(βsawβ | βIβ) x P(βaβ | βI sawβ) x P(βvanβ | βI saw aβ)
But how feasible is this?
- Not really. We use a smaller limit: the n-gram
- Limit of words Basic Idea: Probability of next word only depends on the previous (N β 1) words
- N = 1 : Unigram Model-
- N = 2 : Bigram Model -
State of the art rarely goes above 5-gram.
We apply laplace smoothing for the bigram probabilities.