Tensor Parallelism
Tensor parallelism is a technique used to fit a large model in multiple GPUs.
https://huggingface.co/docs/text-generation-inference/en/conceptual/tensor_parallelism
Tensor parallelism is a technique used to fit a large model in multiple GPUs.
https://huggingface.co/docs/text-generation-inference/en/conceptual/tensor_parallelism