3D Object Detection

Camera only:

https://medium.datadriveninvestor.com/lidar-3d-perception-and-object-detection-311419886bd2 Lidar detection is even more powerful than camera detection, because it gives you depth information.

A lot of the metrics are on https://github.com/open-mmlab/OpenPCDet, the state-of-the-art models are released there.

It seems like a lot of the models are using Voxelization.

Check out this repository: https://github.com/isl-org/Open3D-ML

Representations

Learned from here.

  • Volumetric Grids

    • Using Voxelization, low dimensionality (ex: 32x32x32). It’s suitable for 3D convolutions
  • Point Cloud β†’ this is how input is usually is

  • Polygon Meshes

    • High-quality representation
    • Hard to perform learning
      • Graph NN
      • Start with a sphere mesh and refine it to a specific version
  • Implicit Representation

    • we have
    • if is on surface
    • if is in surface
    • if is outside all surfaces
    • can be approximated by a NN
  • PointPillars

3D Object Detection metrics also use mAP

To Read Personally:

  • PV-RCNN++ β†’ Best performing
    • PV-RCNN, what Alvin presented at the reading group
  • GLENet

2D vs 3D Object Detection

The reason we use 2D object detection more rather than 3D (lidar) is because computer vision has been around longer, there has been more research that has gone into it. So what we do with PointPillars is convert those points and then feed it into a camera object detection model.

Labeling is also usually done in 2D. It is much harder to get ground truth labels in 3D.