Matrix Norm

In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices (of given dimensions).

Just like Vector Norm, matrix norm are used to measure the size of a matrix.

Frobenius Norm / Hibert-Schmidt Norm

The Frobenius Norm is given by

It treats the matrix as a flat vector β€” sum of squared entries, then square root. Easy to compute, but doesn’t tell you anything about directional stretching.

Spectral Norm (Operator 2-Norm)

The spectral norm of is its largest singular value (see SVD):

Why?

Of all the matrix norms, spectral norm answers the question: β€œwhat’s the worst-case amount this matrix can stretch any input vector?” That makes it the natural tool for bounding gradient growth, Lipschitz constants, and stability of repeated matrix application.

Geometric intuition. A matrix maps the unit sphere to an ellipsoid. The semi-axes of that ellipsoid are the singular values . The spectral norm is the longest semi-axis β€” the maximum stretch in any direction. The Frobenius norm, by contrast, is β€” total β€œenergy” across all directions, not the worst case.

For symmetric / normal matrices the spectral norm equals , the largest eigenvalue in absolute value. For general matrices, use singular values of (equivalently, eigenvalues of then square-root).

Why it shows up in deep learning

Backprop multiplies the gradient by a Jacobian matrix at each layer, so the worst-case per-layer amplification is . Stack layers and you get β€” exponential in depth. See Vanishing Gradient Problem for the full Jacobian β†’ singular value walkthrough, and Kaiming init for keeping that product .

In RNNs the same is applied times, so the spectral radius (largest eigenvalue magnitude) of governs whether backprop-through-time explodes or vanishes.