Post-Training Quantization (PTQ)
From this blog: https://semianalysis.com/2024/01/11/neural-network-quantization-and-number/
Post-training quantization (PTQ) does not need to do any actual training steps and just updates the weights based on some simple algorithms:
- The easiest is to simply round each weight to the nearest value.
- LLM.int8() transforms all but a small minority of outlier weights into INT8.
- GPTQ uses second order information about the weight matrices to quantize better.
- Smoothquant does a mathematically-equivalent transformation that attempts to smooth out activation outliers.
- AWQ uses information about activations to quantize the most salient weights more accurately.
- QuIP preprocesses model weights to make them less sensitive to quantization.
- AdaRound optimizes the rounding of each layer separately as a quadratic binary optimization.