Post-Training Quantization (PTQ)

Post-training quantization (PTQ) does not need to do any actual training steps and just updates the weights based on some simple algorithms:

The easiest is to simply round each weight to the nearest value.
LLM.int8() transforms all but a small minority of outlier weights into INT8.
GPTQ uses second order information about the weight matrices to quantize better.
Smoothquant does a mathematically-equivalent transformation that attempts to smooth out activation outliers.
AWQ uses information about activations to quantize the most salient weights more accurately.
QuIP preprocesses model weights to make them less sensitive to quantization.
AdaRound optimizes the rounding of each layer separately as a quadratic binary optimization.

🛠️ Steven Gong