Convolutional Neural Network (CNN)

These are very important for Image Classification and Object Detection.

Whenever I am ready to implement a CNN from scratch, make sure to watch this ConvNet in practice lecture

Why do we need CNNs? Well with a vanilla neural network (MLP), you can have a pixel as an input, but there are simply too many features. The network won’t be able to learn because the dimensionality is way to high. CNN’s works like a neural network, where you just learn the weights of the kernel/filter, but the size of these kernels are much smaller.

At the end, it’s flattened and then use an MLP to do classification.

Three Types of Layers in ConvNet architecture:

Convolution (CONV) layer
- A lot of the work in design ConvNets is selectiving the hyperparameters: padding, strides, number of filters, total size, etc.
Pooling (POOL) layer
Fully Connected (FC) layer

CNN Architectures

Classic Networks
- LeNet-5 (1980s)
- AlexNet
  - First use of ReLU
  - used LRN layers (not common anymore)
  - heavy data augmentation
  - dropout 0.5
  - batch size 128
  - SGD Momentum 0.9
  - Learning rate 1e-2, reduced by 10 manually when val accuracy plateaus
  - L2 weight decay 5e-4
  - 7 CNN ensemble: 18.2% → 15.4%
- VGG
ResNet
Inception
U-Net

Advantages

Parameter Sharing: A feature detection (such as a vertical edge detector) that’s useful in one part of the image is probably useful in another part of the image.
Sparsity of connections: In each layer, each output value depends only on a small number of inputs.

Some Ideas

“Flattening” out the pooling layer.

ConvNet in Practice

Really helpful from this video: https://www.youtube.com/watch?v=pA4BsUK3oP4&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC&index=11&ab_channel=AndrejKarpathy

Data Augmentation

🛠️ Steven Gong

Table of Contents

Convolutional Neural Network (CNN)

CNN Architectures

Some Ideas

ConvNet in Practice

Graph View

Backlinks