Region Proposals
This an interesting idea. However, Andrew Ng says that the end-to-end technique by Andrew Ng says that the end-to-end technique by YOLO seems to be much more promising than a two stage one.
- Propose regions (using Semantic Segmentation)
- This is was previously done using traditional Computer Vision techniques, like Computer Vision techniques, like Selective Search. However, modern techniques learn these regions.
- Make detections on those regions
This medium article is also a worthwhile read.
Algorithms, noted from here
- R-CNN: Propose regions (these are not learned) using Selective Search (2000 regions generated). Classify proposed regions one at a time. Output label + bounding box (this is a more accurate bounding box). → 47s / image with Selective Search (2000 regions generated). Classify proposed regions one at a time. Output label + bounding box (this is a more accurate bounding box). → 47s / image with VGG16, not feasible
- Ad hoc training objectives
- Fast R-CNN: Use convolution implementation of sliding windows to classify all the proposed regions. see Sliding Window for explanation of how this is done.
- Faster R-CNN: Use CNN to propose regions.