Computer Vision

Data Augmentation

Some common techniques if you want to create a more robust CV algorithm, or get more data:

  • Mirroring
  • Cropping various parts ot he image
  • Color shifting. Change the RGB increases
    • +20,-20, +20
    • -20,+20,+20
    • +5,0,+5
    • Advanced β†’ PCA

For implementation, the augmented data is not done with the hard disk, you do it with CPU threads.

Data preprocessing (CS231n Lec 6)

TLDR for image normalization: subtract per-channel mean, divide by per-channel std. The stats are 3 numbers (one per RGB channel) precomputed from the training set.

norm_pixel[i,j,c] = (pixel[i,j,c] - np.mean(pixel[:,:,c])) / np.std(pixel[:,:,c])

Almost all modern image models do this β€” it puts all input channels on the same scale and centers them, which keeps activations well-behaved through the network (same logic as why init matters).

Augmentation as regularization

The general regularization pattern: add randomness at train, average it out at test. Augmentation fits the same recipe as dropout β€” sample a random transform during training, evaluate the deterministic original (or average a fixed set of crops) at test.

Standard augmentations

  • Horizontal flip β€” free for most natural-image classes (cats look like cats either way; doesn’t apply to text or directional data).
  • Random crops + scales β€” ResNet recipe: pick , resize short side to , sample a 224Γ—224 patch. At test, average over 5 scales Γ— 10 crops (4 corners + center, with flips).
  • Color jitter β€” randomize contrast/brightness; advanced version uses PCA over RGB pixels.
  • Cutout β€” zero out random rectangular regions of the input. Works very well on small datasets like CIFAR; less common on ImageNet where natural occlusion is rarer needed. (DeVries & Taylor 2017)

Source

CS231n Lec 6 slides 68 (image preprocessing), 69–76 (augmentation: flips, random crops/scales, color jitter, cutout, regularization framing).