Label Smoothing
This is a trick to make your model less confident. Article here
Idea to me introduced by Andrej Karpathy through his lecture.
Basically, you add 1, so that when you take the Negative Log Likelihood, you guarantee that the value of your is never 0, so then the negative log likelihood would not return .
Hard one-hot labels
Label smoothing (Ξ΅ β [0,1], K classes)
Cross-entropy loss with label smoothing
Expanded form
OMG isnβt this just Laplace Smoothing? Yea basically doing the same thing