Gradient Problems
I’ve always been hearing these problems, and never understood it! I finally understand some of these now, it feels great :))
The gradient “vanishes”
Saturated Neuron / Dead Neuron
We say that a gradient is saturated when taking the partial derivative with respect to this function gives 0. So there is no more flow in the network. Imagine, for example, the tanh(x) function when x is extremely big. then the derivative is basically 0.
See Activation Function, which discusses these activation functions.
Dead ReLU Neuron = it never activates, the weights and gradients don’t work. This might happen when you have too much of a gradient. It’s like a permanent brain damage.
The Vanishing Gradient Problem exists because ??