This diagram illustrates an architecture for deploying and running Hugging Face models on TorchServe for inference. TorchServe
is deployed on Oracle Cloud Infrastructure (OCI) using Oracle Cloud Infrastructure Kubernetes Engine (OKE).
- A User Group from the Internet makes inference requests. The requests enter a VCN deployed in an OCI Region through an Internet
gateway and are routed to OCI Load Balancing deployed in a public subnet.
- OCI Load Balancing directs traffic to an OKE Cluster deployed in a private subnet. This cluster consists of three main components: a UI and
Workers CPU, a Rabbit MQ CPU, and a TorchServe GPU Machine.
- To deploy models, Hugging Face provides Model Files which are stored in either OCI Object Storage or OCI File Storage. These storage services then supply model files to the TorchServe GPU Machine.
- The cluster accesses necessary services through an OCI Service Gateway.
- Additional OCI services include: OCI Registry, OCI Logging, OCI Monitoring, OCI Identity and Access Management Policies, and OCI Vault.