Compositionality
This is something that Sergey emphasizes a lot more.
compositionality means being able to combine familiar skills, objects, or subgoals in novel ways that weren’t explicitly seen in training — and doing so with correct logic and causal order.
“If the model knows how to open a drawer, and it knows how to put a spoon in a container, can it figure out how to put the spoon in the drawer?”
Can VLAs solve composition?
Do LLMs learn composition?
Transformers can simulate compositional reasoning because:
- They use attention to dynamically bind tokens (roughly analogous to variable binding)
- They can represent hierarchical dependencies in positional/attention patterns
- They get trained on massive data full of natural compositions (language itself is compositional)
Because language itself is massively redundant and compositional, models can often memorize compositional examples from similar phrases: “If you know ‘two red cubes on the table’ and ‘three blue cubes on the shelf,’ you can interpolate to ‘two blue cubes on the shelf’.”
That looks compositional, but it’s statistical interpolation, not systematic recombination.
LLMs essentially learn compositionality by brute-force coverage, not by abstract structure.
In self-driving, do you have the same problem?