RT-1: Robotics Transformer for Real-World Control at Scale
My question is, how is it trained on some of the datasets where there are no instruction annotations??
- Then, the instruction is just empty
“It takes in a history of 15 images along with the natural language“.