Daily Research Skill Honing

“If we lose the details, we lose it all.”

Beyond honing your taste by reading papers every day, you also need to hone your engineering skills which is how you turn your ideas and execute on them. Now, with the rise of AI coding tools, it’s become extremely easy to just say what you want and the code will just manifest itself. That being said, this can be a very big crutch because you don’t actually understand what the model is doing. So the gap between your ideas and implementation will be missing. Now, how important is understanding implementation detail?

I still stand by Code is Truth. Actually, I think code is just a lower level abstraction of what your idea is. And to be 10x Developer, you need to be able to jump between levels of abstractions very quickly. And in this case, the low-level abstraction is your pytorch code (you can go deeper if you are a kernel engineer but that’s for another day).

So how should we hone our engineering skills? I think first lay a better foundation, because I think I have a really shitty pytorch foundation. I just need to do more fundamental projects and get experience training models (i did some of this at Dyna Robotics and long time ago, but you just need to build shit).

How do you know if you can use AI?

When you’re able to write up the code in your head, and you know exactly how to do, then you know it’s the right time that you can kind of rely on AI. I’m very comfortable doing this with C++ / ROS for example, but for pytorch code, I still am crutched. So you need to acquire those skills.

Also faraz mentioned this idea that I should try, where you run mini-experiments that is not research worthy, but is extremely insightful.

Things you need to do from scratch (i.e. you are only allowed numpy and pytorch):

Inspirations:

https://x.com/Almondgodd/status/1971314283184259336 Anand’s tinyworld
https://github.com/adam-maj/deep-learning/

Datasets

Image datasets (standard ones)

MNIST (28x28, ~6000 images per class, 10 classes)
- digits 0-9
CIFAR-10 (60000 32x32 colour images in 10 classes, 6000 images per class)
- Classes such as ship, dog, deer, bird, etc..
- CIFAR-100 (CIFAR-100 is 100 classes containing 600 images each)
COCO (image captioning)

Text datasets

?
Chess environment (need gpu) https://www.youtube.com/watch?v=P6sfmUTpUmc

Architectures to do from scratch:

SimCLR
BYOL
JEPA
kNN (but with images?)
Intuition for gradients
NN
MCTS
RNNs
CNNs
LSTMs

Transformers

01-transformer
02-bert
03-t5
04-gpt
05-lora
06-rlhf
07-vision-transformer

Image captioning

Image captioning on COCO

Image generation

01-gan
02-vae
03-diffusion
Latent diffusion
04-clip
05-dall-e

🛠️ Steven Gong

Daily Research Skill Honing

Graph View

Backlinks