is called the “logit”.
In code, we have simply:
def softmax(X): exps = np.exp(X) return exps / np.sum(exps)
- Multinomial Logistic Regression
- probability distribution
The loss of 0 is the theoretical minimum, but the correct class should go towards infinity, and incorrect classes should go towards negative infinity.
logits = torch.tensor([100.0, 0.0, 0.0, 0.0]) torch.softmax(logits, dim=0) # tensor([0.9584, 0.0176, 0.0065, 0.0176])
Softmax function is prone to two issues:
- Overflow: It occurs when very large numbers are approximated as
- Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as
To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector
z such that:
def stable_softmax(x): # Assumes x is a vector z = x - max(x) exps = np.exp(z) return exps/np.sum(exps)