Image Classification

Image Classification problem: Assign a label (from a fixed set of categories) to an input image.

The image is just a 3D tensor of integers — for CIFAR-10, a $32 \times 32 \times 3$ array of pixels in $[0, 255]$ . There is a semantic gap between this raw matrix of numbers and the label “cat” that a human assigns. Hard-coding rules (“cats have pointy ears, two eyes…”) is hopeless because of:

Viewpoint variation (every pixel changes when the camera moves)
Illumination
Deformation (cats fold into arbitrary shapes)
Occlusion
Background clutter (object blending with environment)
Intra-class variation (ex: chair)

The Data-Driven Approach

Instead of hard-coding rules, collect a dataset of images + labels, train a classifier, evaluate on held-out test images. This is the API:

def train(images, labels):
    # build a model
    return model
 
def predict(model, test_images):
    # use model to predict labels
    return test_labels

Standard benchmark datasets in the lecture: MNIST (10 digits, $28 \times 28$ ), CIFAR-10 / CIFAR-100, ImageNet (1000 classes, ~1.4M images), Places365, Omniglot.

Source

CS231n Lec 2 slides 4–19 (data-driven approach, semantic gap, datasets).

🛠️ Steven Gong

Table of Contents

Image Classification

The Data-Driven Approach

Source

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Image Classification

The Data-Driven Approach

Related

Source

Graph View

Backlinks