Head-to-Head Autonomous Racing

This my Research project for F1TENTH under professor Yash Pant. Currently on pause to focus on the F1TENTH Competition.

See F1TENTH Research Proposal.

How people overtake

From ChatGPT: In most racing events, drivers are advised to keep a distance of at least one car length between their car and the car in front of them. This distance allows for sufficient reaction time in case the car in front suddenly slows down or makes a sudden move.

However, in some racing events such as NASCAR or Formula One, drivers tend to race in close proximity to one another, and a distance of less than a car length may be required for optimal performance. In these cases, drivers rely on their skills and experience to maintain a safe distance while still pushing their limits to remain competitive.


Problem statement (adversarial planning): How to plan a racing trajectory to optimally overtake / defend an opponent? Plan in the face of an opponent that is trying to get you to lose, but it also is incentivized to not crash.

Two subproblems:

  1. Predicting an opponent trajectory
  2. Planning around (in the case of attacking) or against (in the case of defending) opponent trajectory

But it seems like 1 and 2 are a chicken and the egg problem, since your opponent’s trajectory will depend on the way you drive as well. But you plan depends on your opponent’s trajectory. These two are done simultaneously.

Some avenues:

  • Planning in Belief Space (not quite)
    • This is usually when you have uncertainty about your own state. Doesn’t directly model uncertainty about your opponent. But you can treat your opponent as part of the environment.
  • Multi-Agent RL
    • Most obvious way, this is a competitive RL environment with 2+ agents
  • Imitation Learning
    • Look at expert demonstration to help speedup training with initialization
  • Deep Reinforcement Learning with self-play
  • Curriculum Reinforcement Learning -> Idea of breaking learning into several stages: qualifying (objective is to minimize time), then racing (objective is to defeat adversary)
  • Game Theory
    • Head-to-head racing is a two-player zero-sum game, which guarantees the existence of a Nash Equilibrium. Can be solved by CFR
  • Transfer Learning using a neural network as a base function, which learns high level features on how the car drives
  • Probabilistic Deep Learning to learn the opponent’s global planner directly. Draw the paths that it is following.

Taking slower racing lines wins you races

When you do head to head racing, you actually sometimes take a slower racing line than your opponent, and get ahead of them. This is counterintuitive, but it is because you get in front of your opponent before the corner, and therefore force them to slow down.

You should watch Why the Racing Line Doesn’t Win Races to get intuition.

Also watch this video for a practical guide to overtaking to get good intuition.


Idea: Combine simulation (to get overtaking knowledge) (part 1) with real-world learning (to learn the dynamics) (part 2).

Limitations of FormulaZero.

Interesting work here: http://www.mpc.berkeley.edu/research/adaptive-and-learning-predictive-control

  • Have an AI that learns a representation of the track, without explicitly feeding it one (this can just be learned with SLAM, it’s not hard)
  • Plan around an opponent


  • Idea: MPC, + RL? How do these two come together?
  • Game Theory with Nash Equilibrium finding
  • Self-play, vs learning from demonstrations? NO, this is a solution
  • AI reasoning. Reasoning vs. optimization?

Getting intuition: See Raceline Optimization, and watch Why the Racing Line Doesn’t Win Races

  • Hamilton always breaks later, he has good racecraft, his choice of lines, higher straight lines, better tire management, more consistent laps
  • The racing line will only get you so far

You don’t even know how to get started, because you don’t even know what to solve. This is the Meta-Problem. Hard things mean more thinking.

What is the question / problem? Problem 1: How to predict how your opponent is going to behave?

  • Is it even possible to predict how your opponent is going to drive?
    • Yes, drivers usually drive the same way every lap. There are patterns. Unless it’s a really bad driver? But maybe a drivers drives differently at the beginning to test the track conditions. Problem 2: How do you behave (plan) given that you know how your opponent is going to behave?
  • This question doesn’t make sense, because your opponent drives depending on how you drive as well. So how can you have certainty about the way they behave? So there is a chicken and the egg problem.
    • So you make some assumptions about the way your opponent. Self-play
    • Your opponent is going to be defending, knowing that you want to overtake, so you need to adjust. But you don’t need to do beliefs, just observe where your opponent is going
  • Urban driving has this problem too: You see a civilian waiting at the red traffic light. Because you have underlying rules, you would slow down to minimize the collision? This seems like an unsolved problem. See Multi-Agent System Problem 3: What makes an F1 racer so good? Are F1 racers the best racing drivers in the world? Problem 4: How to make computations faster?
  • Current solution: Transformers, parallelize Problem 5: How to deal with uncertainty of the track, uncertainty of your opponent? Problem 6: What are the rules of head to head racing? What prevents my solution from just blocking you?
  • But “defending” is an important concept in real world racing. Legal and illegal moves. So I think the first thing is approaching formalization? Not interested Problem 7: How do teach a robot about the rules of the world? Illegal move, either make that super negative reward, or don’t make it an option Problem 8: How to enforce these rules? Problem 9: The dynamics of the car is a non-trivial problem

I don’t think using Nash equilibrium in this concept makes much sense. You want to try to maximally exploit the opponent? The goal is to win the race, not to minimize the time it takes to complete a race.

  • You should take the slower racing line, if that means slowing your opponent down

The path is constrained by the track bounds, as well as the path that the opponent is taking. What is the cost function?

  • In urban driving, you already have a preset path, so it is relatively easy to do control, since you know your target points, you have PID Control and MPC for more advanced
  • In pure pursuit (1 driver setting), you do Raceline Optimization offline.
  • Racing, when you are far from your opponent, is to use the racing line. But when you are close, you solve it.
    • What if you are in front? You should also make a plan to defend

So 3 types of plans:

  • Pure pursuit plan (it’s following the optimal racing line)
  • Defending plan (reward: if you are in front of the opponent)
  • Attacking plan (overtaking)

minimax situation The question is, you want to generate new plans based on your opponent

it should have a model of the opponent

I feel like it is really hard explicitly tell the agent how to attack and defend?

I am thinking of doing Transfer Learning using a neural network as a base function, which learns high level features on how the car drives. Use Probabilistic Deep Learning? Instead of learning the opponent’s global planner directly. Draw the paths that it is following.

I think some form of Deep RL is required to do reasoning? Or not.

  • Traditional RL doesn’t have this concept of reasoning about what an “opponent” might do. Think playing Atari breakout. I mean MCTS + RL is usually the way

For the AI to learn about ways to overtake, reinforcement learning?

  • State = position on the track + information about the opponent
  • policy = speed and angle of the opponent?

But formulating the reward is very difficult?

  • reward = - time to complete lap

But I am hard coding this, instead of explicitly writing something for it.

  • What are the target points generated? You need to search those ideal trajectories. And do it fast. Why don’t we apply Monte-Carlo Tree Search

I think learning-based methods.

If you consider problem 1 and 2 separately, you do: problem 1 -> prediction, a model of your opponent problem 2 -> some sort of robust optimization

Also, racing in real life is much harder. The dynamics is a non-trivial problem.

Or if you want to combine them, you can do like Multi-Agent RL.

But knowing how your opponent is going to behave is a big assumption.

Fundamental truths


  • There exists an optimal solution?

My intuition is that you just follow the optimal racing line and adjust based on what the opponent is saying.

I need to stop relying on the way humans think to approach a solution

  • Reasoning by analogy is terrible

Are humans predictable? Is that the right question, because it is difficult to answer? Can you predict what I am going to eat for dinner? Knowing what I ate yesterday. And looking at my patterns. If you have all the information in the world, but still, it might be a little hard

Optimal racing line. There are some great videos about the optimal racing line.

Underlying assumption:

  • Humans have free will. But i think free will is an illusion. Just like consciousness. So if we don’t have free will, then this world is deterministic. Then you get into fundamental questions like the origin of the universe.

The question is, how we can create a set of Policy such that we can be as fast as possible, but not knowing what our opponent is going to behave?

We ultimately want to start from the back of the grid, and then overtake our opponents as efficiently as possible while avoiding crashing.

  • This is really interesting because it’s like “I really want to beat you, but I need to collaborate with you so I make sure we don’t crash”

I saw this idea of Curriculum Reinforcement Learning through this paper.

See FormulaZero Paper for my comments.

I was discussing with Yash about how why can’t we just use a Genetic Algorithm? Because we want to generate a set of policies?

  • Modelling your opponent isn’t really realistic, but I mean yea at first?
  • So yes, FormulaZero generates a set of opponents, and then tries to do well against all these kinds of opponents

Idea: We want our policy to be robust to variety of agents, but we also want to implement some sort of real-time policy improvement. I really like the Curriculum-Based learning method, maybe combine that with some sort of genetic algorithm?


Let’s think about it, how do humans learn how to race? they learn to brake as late as possible, which allows them to overtake opponents

  • But you learn this for qualifying

Also, how do humans drive on qualifying vs in an actual race?

  • Well qualifying is not a fair comparison, because they have unlimited DRS

Look at this video for the Overtaking Rules in F1

  • There are a few gray areas, about the overtaking rules

https://www.youtube.com/watch?v=Hu94DaDEbj8&ab_channel=Motorsport.com You cannot just divebomb.

There is this rule about reaving So how do we encode these rules in F1TENTH? This was what Soham was working on. DEAD end.