Computer Vision

Computer Vision is everything related to Camera. In recent years however, there are advances with LiDAR technology and so the more general problem is referred to as Perception.

See Camera for historical context.

How can humans actually see?

I really enjoyed Cyrill Stachniss course, which taught me this stuff really well. The first lecture on cameras was particularly helpful.

Resources

Courses
- CS231N by Stanford
- Deep Learning for Computer Vision by Justin Johnson
- Deep Learning Specialization by Andrew Ng
Slides by F1TENTH is a good overview of Classical Methods and Deep Learning Methods (video format also available online)
These slides https://www.cs.cmu.edu/~16385/s17/

Concepts

Reading Group

Vision Transformer (ViT)

An image is worth 16x16 words paper
- https://arxiv.org/abs/2010.11929
Deblurring Images (SRCNN) https://arxiv.org/abs/1501.00029

Masked Autoencoders

https://arxiv.org/abs/2111.06377

In 2D, we always use CNN, and sometimes Transformer.

PV-RCNN

Datasets

COCO (200,000)

I heard somewhere that usually you don’t train an entire CNN from scratch, since that requires millions of labeled data that you don’t have. And rather, what you do, is build off a backbone trained neural network and use it for your own tasks. YES! I found it, this is the idea of Transfer Learning <- See page for actual results

Tips for doing well on benchmarks/winning competitions

Taken from CS231n course.

Ensembling (this is never used in production because it is too computationally expensive)
- Train several networks (3-15 networks) independently and average their outputs
Multi-crop at test time
- Run classifier on multiple versions of test images and average results (ensemble)
Use architectures of networks published in the literatures
Use open source implementations if possible (because they have figured out the finnicky details, like learning rate parameters)
Use pretrained models and fine-tune on your dataset

🛠️ Steven Gong

Table of Contents

Computer Vision

Concepts

Reading Group

Tips for doing well on benchmarks/winning competitions

Graph View

Backlinks