🛠️ Steven Gong

Search

SearchSearch

Feb 11, 2026, 1 min read

Visual Instruction Tuning (LLaVA)

Project page:

  • https://llava-vl.github.io/

  • Similar to PaliGemma, they also use CLIP as the visual encoder

Follow up paper:

  • https://arxiv.org/abs/2310.03744

Graph View

Backlinks

  • Vision-Language Model (VLM)

Created with Quartz, © 2026

  • Blog
  • LinkedIn
  • Twitter
  • GitHub