Stochastic Gradient Descent (SGD)

Stochastic gradient descent samples the gradient.

SGD

  • Stochastic Gradient Descent Randomly select a subset of training data.
  • Gradient depends only on selected subset

Helps avoid getting stuck in local minimum.

Stochastic gradient descent is a specific case of Mini-Batch Gradient Descent.

CS294

SGD minimizes expectations, for a differentiable function of , SGD solves We can use this with Maximum Likelihood Estimation