WATonomous Perception

Also see WATonomous Cheatsheet.

TODO: get in touch with Aryan

Perception will do all of the following (taken from Multi-Task Learning):

  1. Object Detection
    1. Traffic Signs (usually from HD Map)
    2. Traffic Lights (usually from HD Map)
    3. Cars
      • What velocity is it moving at? Is it static or moving?
      • Left blinker or right blinker on? to help predict other vehicle’s trajectories
      • What kind of car? most important is emergency Vehicle (Ambulance) since we need to yield to them, CARLA AD Challenge has a punishment of 0.7 for not yielding
      • Other lower priority: Is the car’s door open or closed?
    4. Traffic Cones
    5. Pedestrians, see what Zoox can do
      • Are they looking at their phone? Are they paying attention to the road? Are they walking or standing still? (action classification) What kind of human (children, adult, senior)?
    6. Road Markings (usually from HD Map)
  2. Semantic Segmentation
    1. Which parts of the road is drivable? (usually from HD Map)
      1. Where are the lane lines?
      2. Where are the road curbs?
      3. Where are the crosswalks?


This is where we want to work towards by end of Apr 2024.

Obstacle vs. detection: I think it doesn’t make sense to categorize between obstacle vs. non-obstacle. Semantically, for the model that we train, it is just a “detection”.

Other things to think about

  • Predictions in what Coordinate Frame? Is perception, or later in the stack supposed to take care of this. What reference frames?
  • You are assuming that you have a single sensor coming in. What happens when you have mutliple camera? Look at Tesla
  • Robustness of predictions?
  • How are you gonna do Multi-Task Learning with all these sensors?
  • More channeled ROS2 topics? Right now, I’m thinking of putting all the relevant info into string label
Future directions / for people who want to read papers and not code as much

This is interesting work. But arguably not as high priority.

Current Perception Stack (on monorepo_v1)


Where we’re going: https://drive.google.com/file/d/1VMyHuNRETZ5gWRdH9H7obLK6-alnZBPx/view?usp=sharing

  • Update 2023-01-27: I think complexifying all of this by introducing all these modalities is stupid. You should be problem oriented, what predictions are you trying to make? Refer to hand-drawn chart above.


Immediate Focus:

  1. Download the ONNX version of model weight
  2. Get the template from IbaiGorordo here
  3. Run inference!

Personal Notes

To be an expert in perception, I need to:

Future Research directions:

NO, I think main thing is to get really good at engineering.


Papers with Code, interesting topics:

Literature Reviews:



Blog for object detection:


Camera is useful for: