Tail Recursion
A recursive function is tail-recursive if the recursive call is the last thing executed, with no pending work after it returns.
Why care?
Without tail call optimization (TCO), each recursive call pushes a new Stack Frame, so deep recursion blows the stack. With TCO, the compiler reuses the caller’s frame, giving O(1) space instead of O(n), and turning the recursion into a loop under the hood.
Tail recursion is recursion with no pending work. https://www.youtube.com/watch?v=_JtPhF8MshA&ab_channel=Computerphile
Pending work vs tail position
“Tail position” means the recursive result is returned directly, unmodified. Any arithmetic, branching, or cleanup left to do after the call disqualifies it.
// NOT tail-recursive: must multiply by n AFTER the call returns
int fact(int n) {
if (n == 0) return 1;
return n * fact(n - 1); // pending multiply
}
// Tail-recursive: acc carries the partial result, nothing pending
int fact_tail(int n, int acc) {
if (n == 0) return acc;
return fact_tail(n - 1, n * acc);
}The non-tail version must remember n to do the multiply, so its frame has to stick around. The tail version threads state through the accumulator, so once the call is made there is nothing left for the caller to do.
Under the hood
A normal call instruction:
- Pushes the return address
- Saves the caller’s frame pointer
- Allocates a fresh frame for the callee’s locals and arguments
With TCO, since the caller has no pending work, the compiler:
- Overwrites the current frame’s arguments in place with the new call’s arguments
- Replaces
callwithjmpto the function’s entry - Skips pushing a new return address. When the recursion eventually hits the base case,
retreturns directly to the original caller, not to an intermediate recursive frame
Net effect: the recursion compiles to the same machine code as a while loop. Stack depth stays constant no matter how many iterations.
Fibonacci
The classic fib(n) = fib(n-1) + fib(n-2) is NOT tail-recursive: the + is pending after both calls, so even with TCO the compiler cannot eliminate the frames (plus it is O(2^n) time).
A tail-recursive rewrite threads two accumulators:
// call as fib(n, 0, 1)
int fib(int n, int a, int b) {
if (n == 0) return a;
return fib(n - 1, b, a + b);
}This runs in O(n) time and, with TCO, O(1) space, identical to the iterative two-variable version.
Language support
- Scheme, Haskell, OCaml, Erlang guarantee TCO by the language spec
- GCC and Clang apply it in C/C++ at
-O2when they can prove it is safe, but no guarantee - Rust does not guarantee TCO. LLVM often applies it, but there is no
becomekeyword (yet) - Python and Java deliberately omit TCO. Guido rejected it to keep clean stack traces for debugging
Don't rely on TCO unless your language guarantees it
Portable code that needs O(1) space should just use a loop.