# Softmax Function §

is called the “logit”.

In code, we have simply:

def softmax(X):
exps = np.exp(X)
return exps / np.sum(exps)

We use the Softmax Function to compute the Softmax Function to compute the Cross-Entropy Loss.

• Multinomial Logistic Regression
• probability distribution

The loss of 0 is the theoretical minimum, but the correct class should go towards infinity, and incorrect classes should go towards negative infinity.

logits = torch.tensor([100.0, 0.0, 0.0, 0.0])
torch.softmax(logits, dim=0)
# tensor([0.9584, 0.0176, 0.0065, 0.0176])

### Numeric Stability §

https://stackoverflow.com/posts/49212689/timeline

Softmax function is prone to two issues:

• Overflow: It occurs when very large numbers are approximated as infinity
• Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as zero

To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector x, define z such that:

def stable_softmax(x): # Assumes x is a vector
z = x - max(x)
exps = np.exp(z)
return exps/np.sum(exps)