Matrix Norm
In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions).
Just like Vector Norm, matrix norm are used to measure the size of a matrix.
Frobenius Norm / Hibert-Schmidt Norm
The Frobenius Norm is given by
It treats the matrix as a flat vector β sum of squared entries, then square root. Easy to compute, but doesnβt tell you anything about directional stretching.
Spectral Norm (Operator 2-Norm)
The spectral norm of is its largest singular value (see SVD):
Why?
Of all the matrix norms, spectral norm answers the question: βwhatβs the worst-case amount this matrix can stretch any input vector?β That makes it the natural tool for bounding gradient growth, Lipschitz constants, and stability of repeated matrix application.
Geometric intuition. A matrix maps the unit sphere to an ellipsoid. The semi-axes of that ellipsoid are the singular values . The spectral norm is the longest semi-axis β the maximum stretch in any direction. The Frobenius norm, by contrast, is β total βenergyβ across all directions, not the worst case.
For symmetric / normal matrices the spectral norm equals , the largest eigenvalue in absolute value. For general matrices, use singular values of (equivalently, eigenvalues of then square-root).
Why it shows up in deep learning
Backprop multiplies the gradient by a Jacobian matrix at each layer, so the worst-case per-layer amplification is . Stack layers and you get β exponential in depth. See Vanishing Gradient Problem for the full Jacobian β singular value walkthrough, and Kaiming init for keeping that product .
In RNNs the same is applied times, so the spectral radius (largest eigenvalue magnitude) of governs whether backprop-through-time explodes or vanishes.