On-Demand and Dedicated Modes for OCI Generative AI Models

OCI Generative AI offers two model serving modes: on-demand and dedicated. Review these topics to select the mode that best fits your use case.

After reviewing this page, for the modes available for each model, see Generative AI Models by Region and for prices, see the pricing page.

On-Demand Mode

On-demand mode lets you use supported pretrained foundational models without creating a dedicated AI cluster.

Key features:

Pay as you go for each inference call, whether you use the playground or API.
Start using Generative AI without provisioning dedicated capacity.
Suitable for experimentation, proof of concept, and model evaluation.
Available for pretrained models in regions where the model isn't listed as dedicated AI cluster only.

Dynamic Throttling Limit Change for On-Demand Mode

OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access.

This change depends on the following factors:

The current maximum throughput supported by the target model.
Any unused system capacity at the time of change.
Each tenancy's historical throughput usage and any specified override limits set for that tenancy.

Note: Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.

Tip

Because of the dynamic throttling limit change, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service. By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.

Deprecation for On-Demand Mode

When a model is retired in the on-demand mode, it's no longer available for use in the Generative AI service playground or through the Generative AI Inference API.

When a model is deprecated in the on-demand mode, it remains available in the Generative AI service, but has a defined amount of time that it can be used before it's retired. This amount of time is longer for the dedicated mode.

For the OCI Generative AI models, see the model retirement dates (on-demand mode).

Dedicated Mode

In dedicated mode, you get dedicated GPU capacity for hosting and fine-tuning models in OCI Generative AI. Dedicated AI clusters provide predictable performance and are suited for production workloads.

You can use dedicated AI clusters to:

Fine-tune supported OCI Generative AI pretrained models.
Host OCI Generative AI pretrained models.
Host custom models created by fine-tuning supported pretrained models.
Host imported models that are compatible with OCI Generative AI.

To access a model in dedicated mode, create an endpoint for the model on a dedicated AI cluster.

Dedicated mode is available for supported models in the regions listed for each model.

Commitment for Dedicated AI Clusters

For OCI Generative AI pretrained and fine-tuned models, dedicated AI clusters require a usage commitment.

Hosting clusters: Minimum commitment of 744 unit-hours per hosting cluster.
Fine-tuning clusters: Minimum commitment of 1 unit-hour per fine-tuning job. Depending on the model, fine-tuning might require at least 2 units.

Note

Imported models don't require the 744 unit-hour hosting commitment. If you create a dedicated AI cluster to host an imported model, you can host the model without committing to the minimum hosting commitment that applies to OCI Generative AI pretrained and fine-tuned models.

Retirement for Dedicated Mode

When a model is retired in the dedicated mode, you can no longer create a dedicated AI cluster for the retired model, but an active dedicated AI cluster running a retired model continues to run. A custom model, that's running off a retired model also continues to be available for active dedicated AI clusters and you can continue to create new dedicated AI clusters with a custom model that was created on a retired model. However, Oracle offers limited support for these scenarios, and Oracle engineering might ask you to upgrade to a supported model to resolve issues related to your model.

To request for a model to stay alive longer than the retirement date in a dedicated mode, create a support ticket.

For the OCI Generative AI models, see the model retirement dates (dedicated mode).

Deprecation for Dedicated Mode

When a model is deprecated in the dedicated mode, it remains available in the Generative AI service, but has a defined amount of time that it can be used before it's retired. The dedicated mode deprecation time is longer than the on-demand deprecation time of the same model.

Oracle Cloud Infrastructure Documentation