Simpler alternative to the LSTM (Cho et al. 2014). Merges the input and forget gates into a single update gate z, and drops the separate cell state — only the hidden state propagates.
Equations (CS231n 2024 Lec 7)
r_t &= \sigma(W_{xr} x_t + W_{hr} h_{t-1} + b_r) \quad &\text{(reset gate)} \\
z_t &= \sigma(W_{xz} x_t + W_{hz} h_{t-1} + b_z) \quad &\text{(update gate)} \\
\tilde{h}_t &= \tanh(W_{xh} x_t + W_{hh}(r_t \odot h_{t-1}) + b_h) \\
h_t &= z_t \odot h_{t-1} + (1 - z_t) \odot \tilde{h}_t
\end{aligned}$$
The convex combination $h_t = z_t \odot h_{t-1} + (1-z_t) \odot \tilde{h}_t$ is the key gradient-flow path: when $z_t \approx 1$, the old state passes through unchanged (like an LSTM forget-gate-open). Fewer parameters than LSTM for comparable performance on many tasks.
### Source
CS231n 2024 Lec 7 slide 122 (GRU equations from Cho et al. 2014, shown alongside LSTM variants MUT1/MUT2/MUT3 from Jozefowicz et al. 2015).
### Related
- [[notes/Long Short-Term Memory|LSTM]]
- [[notes/Recurrent Neural Network|RNN]]