Regularization
Regularization is any technique that discourages model complexity to prevent overfitting, related to Occam’s Razor. Used widely in Machine Learning.
The general form adds a penalty to the loss:
where:
- is the regularization strength
- penalizes model complexity (e.g. weight norm)
Intuition
The data term wants the model to fit the training set. The penalty term wants the model to stay simple. is the exchange rate. A flexible model usually has many weights that fit the training data equally well; the penalty picks the simplest one from that bunch, which is the one most likely to generalize. Equivalently (Bayesian view), is a prior over weights, and minimizing loss plus penalty is MAP estimation.
Parameter penalties (reduce effective model capacity):
- L2 Regularization:
- L1 Regularization (Lasso):
- Elastic Net (L1 + L2):
- Max-norm regularization
Training-time regularization:
- Early Stopping
- Dropout
- Data Augmentation
- Stochastic depth
Model/structure choices (capacity control):
- Simpler model / fewer parameters
- Feature selection / dimensionality reduction (PCA)
- Ensembling (bagging, random forest): reduces variance
Linear models:
- SVM margin (hinge loss + ) acts like regularization via margin/penalty tradeoff
Normalization (not strictly regularization but stabilizes / prevents overfitting):
- BatchNorm / LayerNorm
- Gradient Clipping
- Weight constraints: max-norm, spectral norm
L1 vs L2 preference
From the CS231n Lec 3 slides. Given , two weight vectors with the same dot product :
- : L1 picks this (“sparse”)
- : L2 picks this (“spread out”)
Both produce identical predictions on this ; the regularizer is what breaks the tie. Pick the penalty that matches your prior over what a “good” classifier looks like.
Why regularize?
- Express preferences over weights (which solution among the equivalent ones?)
- Make the model simple so it generalizes to test data
- Improve optimization by adding curvature (the loss landscape becomes more bowl-shaped)
From MATH213
Definition 2: Regularization
A function is the regularization of a function if
- For all in the domain of ,
- For all such that exists but is undefined,
Theorem 1
If is the regularization of a function that has a finite number of discontinuities then
Also see Finite Zeros and Finite Poles.