🛠️ Steven Gong

Search

SearchSearch

May 06, 2025, 1 min read

Vision-Language Model (VLM)

Resources

  • https://huggingface.co/blog/vision_language_pretraining

Models

  • PaliGemma
  • LLaVA
  • Google Gemini

For example, Prismatic VLM.

In robotics, we fine-tune these VLMs to make VLAs.

Graph View

Vision-Language Model (VLM)PaliGemma Large Language and Vision Assistant (LLaVA)Prismatic VLMVision-Language Action Model (VLA)

Backlinks

  •  Large Language and Vision Assistant (LLaVA)
  • PaliGemma
  • Prismatic VLM

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub