SSD
Original Paper: https://arxiv.org/abs/1512.02325
https://developers.arcgis.com/python/guide/how-ssd-works/
Summary: SSD is an image detector that uses a single deep neural network. It uses a set of default boxes that have different scales and aspect ratios. At prediction time, it adjusts the bounding box to produce a better fit.
The key innovation with SSD is that it eliminates region proposal and the subsequent pixel/feature resampling stage.
Key Points:
- Multiple layers → handle different scales
- Different filters predict boxes of different shapes/sizes
Architecture
Image → Feature Extraction → Detection Heads → Non-Maximum Suppression (NMS)
- Feature Extraction to build feature maps.
- Uses VGG
- Detection Heads Convert feature maps into prediction about boxes.
- Non-Maximum Suppression (NMS) Exists to remove repeated detections
Key lessons:
Data augmentation: Used to make model more robust to various object sizes and shapes.
is crucial.