AI Inference Library TensorRT (no really a library) llama.cpp vLLM All of these do quantization. What about pruning?