Overview of Deploying Agents in OCI Generative AI

You can deploy agents by using OCI Generative AI Applications, which provide a managed runtime for containerized agent workloads.

To deploy an agent, package it as a container image, upload it to Oracle Cloud Infrastructure Registry (OCIR), and deploy it by using the OCI Console, API, or CLI.

During deployment, configure:

  • Scaling
  • Storage
  • Networking
  • Authentication

After deployment, the service provisions an endpoint (for example, an HTTP URL) that clients or other agents can use to invoke the agent.

How it Works

After developing an agent locally (for example, by using LangGraph or similar frameworks), you create a Generative AI application to define the runtime configuration.

You then create a deployment by selecting a container image. The active deployment serves requests through the application endpoint. After the deployment is provisioned, the endpoint becomes available for invoking the agent.

Walkthrough

Use Generative AI Applications to deploy agents as managed containerized applications in OCI Generative AI.

With Generative AI Applications, you build a container image, upload it to Oracle Cloud Infrastructure Registry (OCIR), and deploy that image as a Generative AI Application by using the OCI Console, API, or CLI.

When you deploy an agent, you can configure how the application runs and how clients access it, including:

  • Scaling
  • Storage
  • Networking
  • Authentication

After the deployment is provisioned, OCI Generative AI provides an endpoint, such as an HTTP URL, that clients can use to invoke the deployed agent.

Deploying an agent is useful when you want a managed runtime for a containerized agent application, with OCI-managed deployment configuration and endpoint provisioning.

For more information, see the topics about Generative AI Applications and deploying containerized agent applications.

Compare Applications with Other OCI Container Deployment Options

Compare Generative AI Applications with OCI Container Instances and Oracle Kubernetes Engine (OKE).

OCI Generative AI applications provide a managed deployment option for agentic applications and MCP servers. The following tables compare them with other OCI container deployment solutions.

Compare Generative AI Applications with OCI Container Instances

Capability GenAI Applications OCI Container Instances
Primary use Web services, especially agentic applications and MCP servers Batch jobs, scripts, and workers
Trigger model HTTP or event-driven Manual, API-driven, or scheduled
Scaling Automatic scaling from 0 to many instances No built-in autoscaling
Scale to zero Yes Not automatic
Load balancing Built in User managed
Abstraction level Higher level, serverless-style deployment Lower level container execution
Startup model Fast, request-based startup Starts like a small VM
Networking Managed HTTPS endpoints VCN-level control

Compare Generative AI Applications with OKE

Capability GenAI Applications OKE
Operations overhead Low High
Scaling Automatic scaling from 0 to N Configurable with HPA and cluster autoscaling
Scale to zero Yes Not native
Deployment Simple, by pushing a container image More complex, with manifests and Helm charts
Control Limited Full control
Networking Fully managed Fully customizable
Use case APIs and stateless services Complex distributed systems

Supported Transport Protocols

In OCI Generative AI, after an agent is deployed, clients can invoke it through the provisioned endpoint. The transport protocol depends on the agent server implementation and the interaction model required (request/response, streaming, or bidirectional sessions).

The supported protocols include:

HTTP

HTTP is the most widely supported invocation model.

  • Interaction model: Stateless request/response
  • Transport: HTTP/1.1 or HTTP/2 over TLS
  • Use case: Synchronous API calls and short-lived inference requests

In this mode, the client sends an HTTP request (typically POST with a JSON payload). The server returns a single response after processing completes.

SSE (Server-Sent Events)

Server-Sent Events (SSE) is a unidirectional streaming protocol built on top of HTTP.

  • Interaction model: Client to server (single request), server to client (streamed response)
  • Transport: HTTP with Content-Type: text/event-stream
  • Use case: Streaming responses (for example, token-by-token output)

In this mode, the client sends a request and the server keeps the connection open while streaming incremental results as events.

WebSocket (Full Duplex Streaming)

WebSocket provides persistent, bidirectional communication between client and server.

  • Interaction model: Full duplex (client and server can send messages at any time)
  • Transport: WebSocket protocol (wss://)
  • Use case: Interactive agents, real-time tool execution, and multi-turn sessions

After the initial HTTP upgrade handshake, the connection remains open, enabling bidirectional message exchange over a persistent channel.

Authentication

Set up inbound authentication to control access to agents and outbound authentication to securely access OCI resources.

Applications support OAuth 2.0 authentication using an identity domain. See Setting up Authentication for Agentic Support

Inbound Authentication

Inbound authentication controls who can access your agents by validating tokens from identity providers before routing requests to hosted agents.

OCI Generative AI supports OAuth 2.0 for inbound authentication, integrated with identity providers such as Oracle Identity Cloud Service (IDCS). See Setting up Authentication for Agentic Support.

Outbound Authentication

With outbound authentication, deployed agent applications can securely access other OCI resources within a tenancy.

Access is granted by defining OCI IAM policies that authorize the agent application (as a resource principal) to perform specific actions on specified resources. These policies decide the scope of access based on the principle of least privilege.

After deployment, the platform automatically provisions a Resource Principal Session Token (RPST) for the agent workload. The RPST is securely injected into the container runtime, allowing the application to authenticate to OCI services without using long-lived credentials such as API keys or user tokens.

Within the container, the agent uses the OCI SDK with the resource principal authentication provider. The SDK automatically retrieves and refreshes the RPST, enabling secure access to authorized OCI services such as Object Storage, Autonomous Database, Vault, and Streaming.

Networking for Deployments

In OCI Generative AI, by default, deployed applications have outbound access to the public internet. This allows agent workloads to access external resources such as public MCP servers, third-party APIs, foundation model endpoints, and other internet-hosted services.

For private networking, you can enable Customer Networking Mode. In this mode, you specify a target subnet in a VCN within your tenancy. The platform establishes a secure connection between the agent workload and the subnet by using a Private Endpoint / Reverse Connection Endpoint (PE/RCE).

When enabled, all outbound (egress) traffic from the agent is routed through the specified subnet. This allows:

  • Secure access to private resources in your network (for example, databases, compute instances, and internal services)
  • Traffic to remain within private network boundaries
  • Network security controls such as Network Security Groups (NSGs), route tables, and firewalls to govern outbound connectivity
  • Restriction or disabling of public internet access, based on your security requirements

This model supports both internet-facing workloads and private, enterprise-integrated deployments while maintaining clear network isolation between the platform and your environment.

Managed Storage

Agent workloads often require stateful services to support short-term memory, checkpoints, caching, and context storage. To simplify operations, OCI Generative AI provides fully managed storage services for deployed agents.

When deploying an agent, you can select one or more of the following managed storage options:

  • PostgreSQL
  • OCI Cache
  • Oracle Autonomous Database

These services are automatically provisioned and configured for your application.

How Managed Storage Works

Managed storage differs from storage that you provision in your own tenancy:

  • Service-managed deployment

    Storage is provisioned in the service tenancy and is not exposed for direct external access (for example, through database clients or public endpoints).

  • Application-scoped access

    Only the associated deployed application can access its storage. Access is managed by the platform, so no manual networking or credential configuration is required.

  • Lifecycle integration

    Storage is tied to the lifecycle of the agent:
    • Created when the agent is deployed
    • Scales with the application (where supported)
    • Deleted when the agent is deleted
  • No administrative management

    The platform manages the storage infrastructure. You do not have DBA-level access or control over the underlying resources.

Important

When an agent is deleted, its managed storage is permanently removed and cannot be recovered.

When to Use Customer-Managed Storage

Use customer-managed storage when you need:

  • Independent storage lifecycle
  • Full administrative control
  • Direct access from external systems or tools
  • Custom configuration, extensions, or shared access across applications

In these cases, provision storage in your own VCN and tenancy, and configure the agent to connect to it by using Customer Networking Mode.

This approach provides greater flexibility and control over your infrastructure.