Hinge Loss
The hinge loss is the loss function used to train maximum-margin classifiers, most notably the SVM. It is zero once the correct prediction is “confident enough” (above a margin) and otherwise grows linearly with the violation.
Binary form
For an intended output and a classifier score :
The loss is zero when : the score has the right sign and magnitude . Below that, you pay a linear penalty.
Intuition
Hinge stops caring once you are past the margin: a point correctly classified with score is as good as one with score , both give zero loss. Contrast with cross-entropy, which keeps pushing confidence forever. The “hinge” shape (flat then linear) means the gradient is zero for safe points and constant for unsafe ones, which is why only the margin violators (support vectors) drive the SVM solution.
Multi-class SVM loss
From CS231n Lec 2, given linear scores and true class , sum the hinge over each wrong class:
The ”” is the margin, an arbitrary constant. Its value doesn’t matter because can rescale to absorb it; what matters is its presence (so that just-correct predictions still pay a penalty).
Properties:
- Min: (achieved when correct class beats every other by )
- Max:
- At init (small , all ): . So for CIFAR-10, the SVM loss should start near , a sanity check on iteration 0
- Squared hinge is sometimes used; penalizes large violations more aggressively
Vectorized implementation
def L_i_vectorized(x, y, W):
scores = W.dot(x)
margins = np.maximum(0, scores - scores[y] + 1)
margins[y] = 0 # don't include j = y_i in the sum
loss_i = np.sum(margins)
return loss_iNon-uniqueness of the optimum
If achieves for all , then so does , , etc.: the loss is invariant to positive scaling once it bottoms out. This is why you need regularization (typically ) to break the tie and pick a “small” .
Difference with Softmax
Hinge stops caring once the margin is met; cross-entropy never stops pushing. In practice both work; cross-entropy is now standard because it composes cleanly with softmax and gives probabilistic outputs.