Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

Good survey of the open problems.

off-policy policy gradient is a thing via Importance Sampling.