Stochastic Gradient Descent (SGD)
Stochastic gradient descent samples the gradient.
SGD
- Stochastic Gradient Descent Randomly select a subset of training data.
- Gradient depends only on selected subset
Helps avoid getting stuck in local minimum.
Stochastic gradient descent is a specific case of Mini-Batch Gradient Descent.
CS294
SGD minimizes expectations, for a differentiable function of , SGD solves We can use this with Maximum Likelihood Estimation