Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Good survey of the open problems.
off-policy policy gradient is a thing via Importance Sampling.
Good survey of the open problems.
off-policy policy gradient is a thing via Importance Sampling.