# Activation Function

Activation functions are non-linear. Used in a Neural Network.

Several activation functions:

- Sigmoid Function
- See the page for drawbacks, but it is no longer used

- $tanh(x)$
- Compared to Sigmoid, it still kills gradients when saturated, however, it is zero-centered

- Rectified Linear Unit (ReLU) $→$ $f(x)=max(0,x)$
- Does not saturate (in positive region)
- Very computationally efficient
- Converges much faster than sigmoid/tanh in practice
- Actually more biologically plausible than sigmoid
- Drawback: Not zero-centered, so there are still parts of the network that are not activated because the gradient is 0, i.e. “dead ReLU”

- Leaky ReLU, $f(x)=max(0.01x,x)$
- Parametric ReLU, $f(x)=max(αx,x)$
- Exponential Linear Units (ELU)

$f(x)=f(x)={xα(e_{x}−1) x>0x≤0 $- All benefits of ReLU
- Closer to zero mean outputs
- Negative saturation regime compared with Leaky ReLU adds some robustness to noise

- Maxout Neuron
- Softmax Function

In general, just use ReLU, be careful with learning rates.