GGUF

https://github.com/ggml-org/ggml/blob/master/docs/gguf.md

GGUF is a file format for storing models for inference with GGML and executors based on GGML. GGUF is a binary format that is designed for fast loading and saving of models, and for ease of reading. Models are traditionally developed using PyTorch or another framework, and then converted to GGUF for use in GGML.

Huggingface info on GGUF:

Browse gguf models

  • hf.co/models?library=gguf

If hugging face hosts it, just do

llama-cli -hf bartowski/Llama-3.2-3B-Instruct-GGUF:Q8_0

however, you probably want to first start with a base model, and try different quantization to different GGUF models.