Fully Sharded Data Parallel (FSDP)
https://engineering.fb.com/2021/07/15/open-source/fsdp/
They actually have a paper PyTorch FSDP Experiences on Scaling Fully Sharded Data Parallel that you should read if you really want to understand what’s going on.