3D Object Detection
Camera only:
-
https://github.com/lzccccc/3d-bounding-box-estimation-for-autonomous-driving
-
3D Deep Learning Tutorial: https://www.youtube.com/watch?v=vfL6uJYFrp4&ab_channel=SULabUCSanDiego
https://medium.datadriveninvestor.com/lidar-3d-perception-and-object-detection-311419886bd2 Lidar detection is even more powerful than camera detection, because it gives you depth information.
A lot of the metrics are on https://github.com/open-mmlab/OpenPCDet, the state-of-the-art models are released there.
It seems like a lot of the models are using Voxelization.
Check out this repository: https://github.com/isl-org/Open3D-ML
Representations
Learned from here.
-
Volumetric Grids
- Using Voxelization, low dimensionality (ex: 32x32x32). Itβs suitable for 3D convolutions
-
Point Cloud β this is how input is usually is
-
Polygon Meshes
- High-quality representation
- Hard to perform learning
- Graph NN
- Start with a sphere mesh and refine it to a specific version
-
Implicit Representation
- we have
- if is on surface
- if is in surface
- if is outside all surfaces
- can be approximated by a NN
3D Object Detection metrics also use mAP
To Read Personally:
2D vs 3D Object Detection
The reason we use 2D object detection more rather than 3D (lidar) is because computer vision has been around longer, there has been more research that has gone into it. So what we do with PointPillars is convert those points and then feed it into a camera object detection model.
Labeling is also usually done in 2D. It is much harder to get ground truth labels in 3D.