Multi-threaded Scaling
The ONNX Runtime enables multi-threading and can benefit from multiple CPU cores.
Using multiple threads on a multi-code CPU can reduce the latency for creating a vector for most embedding models. It can also increase the throughput by parallelizing vector creation across requests. The ONNX Runtime automatically sizes thread pools for intra-op and inter-op parallelism based on your workload.