Model Packaging and Export
A trained .pt checkpoint is not a deployable artifact. It requires the exact PyTorch version, the exact Python class definition, and the exact random seed state to reconstruct. ONNX and TorchScript decouple the computation graph from the Python runtime, enabling deployment to C++ servers, mobile devices, and inference runtimes that outperform plain PyTorch by 2–5x on CPU.
Export Formats Compared
| Format | Portability | Speed vs eager | Requirements |
|---|---|---|---|
| PyTorch eager (.pt state dict) | Python only | Baseline | Model class definition |
| TorchScript (.pt traced/scripted) | C++ + Python | 1.2–2x | Supported ops only |
| ONNX (.onnx) | Any ONNX runtime | 1.5–4x | No dynamic control flow |
| ONNX + quantized INT8 | Any ONNX runtime | 3–6x | Slight accuracy trade-off |
Exporting to ONNX
dynamic_axes is critical: without it, the exported graph is locked to batch size 1 and will fail on any other batch.
Validating the ONNX Graph
If the numerical check fails, the graph optimiser has introduced numerical drift — reduce atol tolerance or disable constant folding.
TorchScript Export
Prefer scripting when your model has conditional logic. Use tracing for pure feed-forward networks — it is faster to export and produces cleaner graphs.
Benchmarking Inference Latency
Always warm up before benchmarking — the first several calls load kernels and JIT-compile layers, inflating measured latency.
Summary
- State dicts alone are not deployable artifacts; export to ONNX or TorchScript for portability and speed.
- Use
dynamic_axesin ONNX export to support variable batch sizes at inference time. - Always validate exported models numerically against PyTorch outputs before deploying.
- Prefer TorchScript scripting over tracing when the model contains conditional Python logic.
- Benchmark with warmup runs and measure over at least 500 iterations to get stable latency estimates.