N-Gram

The goal is to generate text. We can use a probabilistic model.

Andrej Karpathy was the first to introduce me this idea.

Resources

Example bigram: P(β€œI saw a van”) = P(β€œI”) x P(β€œsaw” | β€œI”) x P(β€œa” | β€œI saw”) x P(β€œvan” | β€œI saw a”)

But how feasible is this?

  • Not really. We use a smaller limit: the n-gram
  • Limit of words Basic Idea: Probability of next word only depends on the previous (N – 1) words

  • N = 1 : Unigram Model-
  • N = 2 : Bigram Model -

State of the art rarely goes above 5-gram.

We apply laplace smoothing for the bigram probabilities.