Regularization
Regularization prevents overfitting. This is used widely in Machine Learning. Related to the idea of Occam’s Razor.
Regularization Methods:
Parameter Penalties
- L1 Regularization or L2 Regularization or elastic net or max norm regularization
This reduces the effective capacity of the model.
Training-time regularization:
Model/structure choices (capacity control)
- Simpler model / fewer parameters
- Feature selection / dimensionality reduction (PCA)
- Ensembling (bagging, random forest): reduces variance
For linear models specifically
- SVM margin (hinge loss + C) acts like regularization via margin/penalty tradeoff
Normalization
- BatchNorm / LayerNorm: not regularization strictly, but stabilizes and prevents overfitting
- Gradient Clipping
- Weight constraints: max-norm, spectral norm (controls capacity)
Regularization penalizes/punishes the complexity of the model. is the regularization parameter, so we usually have .
We have a general formula
Types
- L2 Regularization
- L1 Regularization (Lasso)
- Elastic Net (L1 + L2)
- Max norm regularization
Neural Network Specific:
- Dropout
- Batch Normalization
- Stochastic depth
From MATH213
Definition 2: Regularization
A function is the regularization of a function if
- For all in the domain of .
- For all such that exists but is undefined, .
Theorem 1
If is the regularization of a function that has a finite number of discontinuities then
Also see Finite Zeros and Finite Poles.