The Chain Rule

In other words,

This is the basis for Backpropagation.

Intuitively, the chain rule states that knowing the instantaneous rate of change of z relative to y and that of y relative to x allows one to calculate the instantaneous rate of change of z relative to x as the product of the two rates of change.

As put by George F. Simmons: “if a car travels twice as fast as a bicycle and the bicycle is four times as fast as a walking man, then the car travels 2 × 4 = 8 times as fast as the man.”

Multivariate Chain Rule

Chain Rule for Paths

The most basic form of the Chain Rule for multivariate calculus is written as

When we have multiple variables, we simply draw chains with the relevant variables.

To know how much one thing affects the final output, add up its effect through every path it can take.

For multivariable cases, the key idea is:

Each path contributes:
“how final changes with intermediate” × “how intermediate changes with original”

Then you sum all paths.

A cleaner explanation you can say:

The chain rule tracks influence through a computation graph.
If there is one path from x to the output, multiply derivatives along that path.
If there are multiple paths, add the contributions from all paths.

This is exactly why in autograd we do:

x.grad += local_derivative * out.grad