Softmax Function
is called the “logit”.
In code, we have simply:
def softmax(X):
exps = np.exp(X)
return exps / np.sum(exps)
We use the Softmax Function to compute the Softmax Function to compute the Cross-Entropy Loss.
- Multinomial Logistic Regression
- probability distribution
The loss of 0 is the theoretical minimum, but the correct class should go towards infinity, and incorrect classes should go towards negative infinity.
logits = torch.tensor([100.0, 0.0, 0.0, 0.0])
torch.softmax(logits, dim=0)
# tensor([0.9584, 0.0176, 0.0065, 0.0176])
Numeric Stability
https://stackoverflow.com/posts/49212689/timeline
Softmax function is prone to two issues:
- Overflow: It occurs when very large numbers are approximated as
infinity
- Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as
zero
To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector x
, define z
such that:
def stable_softmax(x): # Assumes x is a vector
z = x - max(x)
exps = np.exp(z)
return exps/np.sum(exps)