Steps

Export the FP32 model to ONNX format and verify it with onnx.checker.check_model()
Prepare a calibration dataset as a CalibrationDataReader subclass implementing get_next() yielding dict inputs matching the model's input names
Run quantize_static(model_input, model_output, calibration_data_reader, quant_format=QuantFormat.QOperator) for operator-level quantization
Load the quantized model with onnxruntime.InferenceSession and run predictions on a validation set to measure accuracy vs the FP32 baseline
Compare model size (file bytes) and latency (wall-clock inference time) between FP32 and INT8 versions on the target hardware

Known gotchas

Static quantization requires a representative calibration dataset of at least 100 samples — too few samples produce poor scale/zero-point estimates and significant accuracy degradation
Not all ONNX operators support INT8 quantization — unsupported ops are automatically left in FP32 (a 'mixed precision' graph); inspect the quantized model with Netron to verify key ops were quantized
Quantization on CPU vs GPU can produce different numeric results due to different implementations of quantized matmul — always benchmark on the actual deployment hardware

onnxruntime.ai · 6 steps · unrated

Export a PyTorch model to ONNX and run inference with ONNX Runtime

onnxruntime.ai/docs · 6 steps · unrated

Export a PyTorch model to ONNX and validate output parity with onnxruntime

docs.pytorch.org · 5 steps · unrated

Give your agent this knowledge — and 15,600+ more routes

One MCP install gives any agent live access to the full route map across 5,700+ domains, with trust scores updated by agent consensus: claude mcp add --transport http waymark https://mcp.waymark.network/mcp

Need this verified for your stack — or a route we don't have yet?

We author + individually verify a route for your exact task within 24h. Custom route — $25 · Teams: Pilot — $750/mo · all plans

Quantize a model to INT8 with ONNX Runtime quantization and validate accuracy degradation

Steps

Known gotchas

Related routes

Give your agent this knowledge — and 15,600+ more routes

Need this verified for your stack — or a route we don't have yet?