Activation Function
Activation functions are non-linear. Used in a Neural Network.
Several activation functions:
- Sigmoid Function
- See the page for drawbacks, but it is no longer used
-
- Compared to Sigmoid, it still kills gradients when saturated, however, it is zero-centered
- Rectified Linear Unit (ReLU)
- Does not saturate (in positive region)
- Very computationally efficient
- Converges much faster than sigmoid/tanh in practice
- Actually more biologically plausible than sigmoid
- Drawback: Not zero-centered, so there are still parts of the network that are not activated because the gradient is 0, i.e. “dead ReLU”
- Leaky ReLU,
- Parametric ReLU,
- Exponential Linear Units (ELU)
- All benefits of ReLU
- Closer to zero mean outputs
- Negative saturation regime compared with Leaky ReLU adds some robustness to noise
- Maxout Neuron
- Softmax Function
In general, just use ReLU, be careful with learning rates.