Export a PyTorch model to ONNX using torch.onnx.export(model, example_input, 'model.onnx', input_names=['input'], output_names=['output'], dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})
Verify the exported model with onnx.checker.check_model(onnx.load('model.onnx')) to catch shape or opset inconsistencies before optimization
Apply graph optimizations offline: create a SessionOptions object, set sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL, set sess_options.optimized_model_filepath = 'model_opt.onnx', then create a session to trigger the optimization and save the graph
Load the optimized model for inference: session = ort.InferenceSession('model_opt.onnx', sess_options, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
Run inference: outputs = session.run(None, {'input': input_array}) where input_array is a numpy array matching the declared input shape and dtype
Profile performance with ort.SessionOptions() setting enable_profiling=True to generate a JSON trace file for identifying bottlenecks
Known gotchas
The opset_version in torch.onnx.export must be compatible with the ONNX Runtime version installed — using a newer opset than the runtime supports causes a load error; check onnxruntime release notes for supported opsets
Dynamic axes must be declared at export time; a model exported with fixed batch size will raise a shape mismatch error when given a batch of different size at inference
CUDAExecutionProvider must be listed before CPUExecutionProvider in the providers list to use GPU; if CUDA is unavailable ONNX Runtime silently falls back to CPU rather than raising an error
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp