Additive Smoothing

Saw in CS451 for N-Gram as a way to smooth count data.

https://en.wikipedia.org/wiki/Additive_smoothing

Very simple, just start each count at 1, not 0.

But why do we need this?

Because 0 probabilities can become a problem. a single zero probability nullifies the entire product.

Without smoothing, the model assumes that only n-grams observed in the training data are valid, which doesn’t reflect real-world language usage.

There are other smoothing techniques: