🛠️ Steven Gong

Search

SearchSearch

Feb 11, 2026, 1 min read

Model Parallel

Tensor Parallelism

Tensor parallelism is a technique used to fit a large model in multiple GPUs.

https://huggingface.co/docs/text-generation-inference/en/conceptual/tensor_parallelism

Graph View

Backlinks

  • Model Parallelism

Created with Quartz, © 2026

  • Blog
  • LinkedIn
  • Twitter
  • GitHub