Softmax Function

is called the “logit”.

In code, we have simply:

def softmax(X):
    exps = np.exp(X)
    return exps / np.sum(exps)

We use the Softmax Function to compute the Softmax Function to compute the Cross-Entropy Loss.

  • Multinomial Logistic Regression
  • probability distribution

The loss of 0 is the theoretical minimum, but the correct class should go towards infinity, and incorrect classes should go towards negative infinity.

logits = torch.tensor([100.0, 0.0, 0.0, 0.0])
torch.softmax(logits, dim=0)
# tensor([0.9584, 0.0176, 0.0065, 0.0176])

Numeric Stability

Softmax function is prone to two issues: 

  • Overflow: It occurs when very large numbers are approximated as infinity
  • Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as zero

To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector x, define z such that:

def stable_softmax(x): # Assumes x is a vector
    z = x - max(x)
    exps = np.exp(z)
    return exps/np.sum(exps)