Additive Smoothing
Saw in CS451 for N-Gram as a way to smooth count data.
https://en.wikipedia.org/wiki/Additive_smoothing
Very simple, just start each count at 1, not 0.
But why do we need this?
Because 0 probabilities can become a problem. a single zero probability nullifies the entire product.
Without smoothing, the model assumes that only n-grams observed in the training data are valid, which doesn’t reflect real-world language usage.
There are other smoothing techniques: