Convolutional Neural Network (CNN)
These are very important for Image Classification and Object Detection.
Whenever I am ready to implement a CNN from scratch, make sure to watch this ConvNet in practice lecture
Why do we need CNNs? Well with a vanilla neural network (MLP), you can have a pixel as an input, but there are simply too many features. The network won’t be able to learn because the dimensionality is way to high. CNN’s works like a neural network, where you just learn the weights of the kernel/filter, but the size of these kernels are much smaller.
At the end, it’s flattened and then use an MLP to do classification.
Three Types of Layers in ConvNet architecture:
- Convolution (CONV) layer
- A lot of the work in design ConvNets is selectiving the hyperparameters: padding, strides, number of filters, total size, etc.
- Pooling (POOL) layer
- Fully Connected (FC) layer
CNN Architectures
- Classic Networks
- LeNet-5 (1980s)
- AlexNet
- First use of ReLU
- used LRN layers (not common anymore)
- heavy data augmentation
- dropout 0.5
- batch size 128
- SGD Momentum 0.9
- Learning rate 1e-2, reduced by 10 manually when val accuracy plateaus
- L2 weight decay 5e-4
- 7 CNN ensemble: 18.2% → 15.4%
- VGG
- ResNet
- Inception
- U-Net
Advantages
- Parameter Sharing: A feature detection (such as a vertical edge detector) that’s useful in one part of the image is probably useful in another part of the image.
- Sparsity of connections: In each layer, each output value depends only on a small number of inputs.
Some Ideas
“Flattening” out the pooling layer.
ConvNet in Practice
Really helpful from this video: https://www.youtube.com/watch?v=pA4BsUK3oP4&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC&index=11&ab_channel=AndrejKarpathy
Data Augmentation