Implicit Q-Learning
Saw from this paper https://arxiv.org/pdf/2505.08078
In this work, we primarily consider the Implicit Q-Learning (IQL) [20] value objectives given its effectiveness on a range of tasks. IQL aims to fit a value function by estimating expectiles τ with respect to actions within the support of the data, and then uses the value function to update the
Q-function. To do so, it aims to minimize the following objectives for learning a parameterized Q-function (with target Q-function Qϕ′ ) and value function Vψ:
- where .
Original paper:
It learns a Q-function from the offline dataset.
It derives a value function by “aggregating” the learned Q-function across actions.
It then trains the policy to imitate the best actions in the dataset (using advantage-weighted updates).