Understanding ONNX
The model format every framework can read.
What's inside an .onnx file, what the metadata reveals, and how the same model shape runs across PyTorch, TensorFlow, and dozens of runtimes.
The neutral middle.
ONNX (Open Neural Network Exchange, 2017) is a Protobuf-encoded graph format for neural networks. The graph is operator-by-operator: Conv, MatMul, ReLU, Softmax. Any framework that exports to ONNX, any runtime that imports it. Trained in PyTorch, deploy via ONNX Runtime, TensorRT, OpenVINO, Core ML, TFLite. The format decouples the training framework from the production runtime.
What the file contains.
A graph definition: nodes (operators), inputs, outputs, intermediate tensors. The weights as initialiser tensors. The opset version (which version of the operator spec). Optional metadata: model name, producer, version, doc string. The file is one .onnx Protobuf, typically tens of megabytes to a few gigabytes depending on the model. Visualisers like Netron render the graph; the metadata-extractor tool surfaces the high-level facts without rendering.
The metadata worth reading.
producer_name + producer_version — what exported it (PyTorch 2.1, tf2onnx 1.15, etc.). opset_import — operator version, which decides what runtime can read it. graph.input — list of input tensors with shapes (often with dynamic batch dimensions). graph.output— output shapes. Total parameter count and approximate FLOPs derived from the graph. Before deploying a third-party model, this metadata is what tells you it'll fit your runtime.
Opset versions matter.
The ONNX spec evolves; opset versions add operators or change behaviour. A model exported with opset 18 won't load on a runtime that only supports up to opset 13 — the runtime doesn't know the new operators. The fix is either upgrading the runtime or re-exporting the model with a lower opset. The exporter's compatibility flag (opset_version=13) is the easy lever; some operators just don't exist in older opsets.
A worked inspection.
Download a third-party detection model: yolov8n.onnx, 12 MB. Metadata inspection: producer ultralytics 8.2, opset 17, input shape [1, 3, 640, 640] (NCHW format), output shape [1, 84, 8400]. ~3 million parameters. The opset tells you ONNX Runtime ≥ 1.14 is required; the input shape tells you images need 640×640 preprocessing in CHW layout; the output shape tells you 8400 anchor boxes, each with 84 features (4 bbox + 80 class scores). Three numbers; whole deployment story.
YOLOv8n.onnx
12 MB, opset 17
Inspect before deploying.
input [1,3,640,640] ; output [1,84,8400]
= ORT ≥ 1.14 needed
Quantisation and pruning.
ONNX supports quantised models: weights stored as int8 instead of float32, 4× smaller and usually faster. Static quantisation needs calibration data; dynamic quantisation needs nothing but trades a bit more accuracy. The metadata tool shows the operator types — Conv vs QLinearConv, MatMul vs MatMulInteger — so you know whether a downloaded model is quantised before you wonder why it's so fast. Pruning shows up as zeroed weights; ONNX doesn't drop them automatically.