Stochastic Gradient Descent (SGD)

Stochastic gradient descent samples the gradient.

$∆ w = α (v_{π} (S) - v (S, w)) \nabla_{w} v (S, w)$

SGD

Stochastic Gradient Descent Randomly select a subset of training data.

Gradient depends only on selected subset

Helps avoid getting stuck in local minimum.

Stochastic gradient descent is a specific case of Mini-Batch Gradient Descent.

CS294

SGD minimizes expectations, for a differentiable function $f$ of $θ$ , SGD solves $ar g min_{θ} E [f (θ)]$ We can use this with Maximum Likelihood Estimation

🛠️ Steven Gong

Table of Contents

Stochastic Gradient Descent (SGD)

CS294

Graph View

Backlinks