Convolutional Neural Network (CNN)

These are very important for Image Classification and Object Detection.

Whenever I am ready to implement a CNN from scratch, make sure to watch this ConvNet in practice lecture

Why do we need CNNs? Well with a vanilla neural network (MLP), you can have a pixel as an input, but there are simply too many features. The network won’t be able to learn because the dimensionality is way to high. CNN’s works like a neural network, where you just learn the weights of the kernel/filter, but the size of these kernels are much smaller.

At the end, it’s flattened and then use an MLP to do classification.

Three Types of Layers in ConvNet architecture:

  • Convolution (CONV) layer
    • A lot of the work in design ConvNets is selectiving the hyperparameters: padding, strides, number of filters, total size, etc.
  • Pooling (POOL) layer
  • Fully Connected (FC) layer

CNN Architectures

  • Classic Networks
    • LeNet-5 (1980s)
    • AlexNet
      • First use of ReLU
      • used LRN layers (not common anymore)
      • heavy data augmentation
      • dropout 0.5
      • batch size 128
      • SGD Momentum 0.9
      • Learning rate 1e-2, reduced by 10 manually when val accuracy plateaus
      • L2 weight decay 5e-4
      • 7 CNN ensemble: 18.2% → 15.4%
    • VGG
  • ResNet
  • Inception
  • U-Net

Advantages

  1. Parameter Sharing: A feature detection (such as a vertical edge detector) that’s useful in one part of the image is probably useful in another part of the image.
  2. Sparsity of connections: In each layer, each output value depends only on a small number of inputs.

Some Ideas

“Flattening” out the pooling layer.

ConvNet in Practice

Really helpful from this video: https://www.youtube.com/watch?v=pA4BsUK3oP4&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC&index=11&ab_channel=AndrejKarpathy

Data Augmentation