🛠️ Steven Gong

Search

SearchSearch

Mar 17, 2025, 1 min read

GPT-2

GPT-3

Generative Pre-trained Transformer 3 is an autoregressive language model that uses deep learning to produce human-like text.

175B parameters (100x larger than GPT-2)

Interesting new phenomenon (appearing only when network is large enough)

  • In-Context Learning
  • Chain-of-Thought

Resources

  • https://dugas.ch/artificial_curiosity/GPT_architecture.html

To fact check: The embedding dimensions are typically larger than 2: GPT uses 12288 dimensions.

Graph View

Backlinks

  • Artificial Intelligence
  • GPT-2
  • Generative Pre-Trained Transformer (GPT)

Created with Quartz, © 2025

  • Blog
  • LinkedIn
  • Twitter
  • GitHub