🛠️ Steven Gong

Search

SearchSearch

Sep 14, 2025, 1 min read

Visual Instruction Tuning (LLaVA)

Project page:

  • https://llava-vl.github.io/

  • Similar to PaliGemma, they also use CLIP as the visual encoder

Follow up paper:

  • https://arxiv.org/abs/2310.03744

Graph View

Backlinks

  • Vision-Language Model (VLM)

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub