Regularization

Regularization prevents overfitting. This is used widely in Machine Learning. Related to the idea of Occam’s Razor.

Regularization Methods:

Parameter Penalties

This reduces the effective capacity of the model.

Training-time regularization:

Model/structure choices (capacity control)

  • Simpler model / fewer parameters
  • Feature selection / dimensionality reduction (PCA)
  • Ensembling (bagging, random forest): reduces variance

For linear models specifically

  • SVM margin (hinge loss + C) acts like regularization via margin/penalty tradeoff

Normalization

  • BatchNorm / LayerNorm: not regularization strictly, but stabilizes and prevents overfitting
  • Gradient Clipping
  • Weight constraints: max-norm, spectral norm (controls capacity)

Regularization penalizes/punishes the complexity of the model. is the regularization parameter, so we usually have .

We have a general formula

Types

  • L2 Regularization
  • L1 Regularization (Lasso)
  • Elastic Net (L1 + L2)
  • Max norm regularization

Neural Network Specific:

From MATH213

Definition 2: Regularization

A function is the regularization of a function if

  1. For all in the domain of .
  2. For all such that exists but is undefined, .

Theorem 1

If is the regularization of a function that has a finite number of discontinuities then

Also see Finite Zeros and Finite Poles.