Data Augmentation
Some common techniques if you want to create a more robust CV algorithm, or get more data:
- Mirroring
- Cropping various parts ot he image
- Color shifting. Change the RGB increases
- +20,-20, +20
- -20,+20,+20
- +5,0,+5
- Advanced β PCA
For implementation, the augmented data is not done with the hard disk, you do it with CPU threads.
Data preprocessing (CS231n Lec 6)
TLDR for image normalization: subtract per-channel mean, divide by per-channel std. The stats are 3 numbers (one per RGB channel) precomputed from the training set.
norm_pixel[i,j,c] = (pixel[i,j,c] - np.mean(pixel[:,:,c])) / np.std(pixel[:,:,c])Almost all modern image models do this β it puts all input channels on the same scale and centers them, which keeps activations well-behaved through the network (same logic as why init matters).
Augmentation as regularization
The general regularization pattern: add randomness at train, average it out at test. Augmentation fits the same recipe as dropout β sample a random transform during training, evaluate the deterministic original (or average a fixed set of crops) at test.
Standard augmentations
- Horizontal flip β free for most natural-image classes (cats look like cats either way; doesnβt apply to text or directional data).
- Random crops + scales β ResNet recipe: pick , resize short side to , sample a 224Γ224 patch. At test, average over 5 scales Γ 10 crops (4 corners + center, with flips).
- Color jitter β randomize contrast/brightness; advanced version uses PCA over RGB pixels.
- Cutout β zero out random rectangular regions of the input. Works very well on small datasets like CIFAR; less common on ImageNet where natural occlusion is rarer needed. (DeVries & Taylor 2017)
Source
CS231n Lec 6 slides 68 (image preprocessing), 69β76 (augmentation: flips, random crops/scales, color jitter, cutout, regularization framing).