Stitching (Reinforcement Learning)

This is a term that is used quite often in Sergey Levine’s papers on RL.

I think this is the original paper? https://cdn.aaai.org/AAAI/2008/AAAI08-227.pdf “Maximum Entropy Inverse Reinforcement Learning”

“Offline RL algorithms can “stitch” suboptimal trajectories together: while the trajectories τi in the offline dataset might attain poor return, a better policy can be obtained by combining good segments of trajectories (A→E + E→D = A→D). This ability to stitch segments of trajectories temporally is the hallmark of value-based offline RL algorithms that utilize Bellman backups, but cloning (a subset of) the data or trajectory-level sequence models are unable to extract this information, since such no single trajectory from A to D is observed in the offline dataset!”