Dynamic Throttling
OCI Generative AI automatically adjusts each tenancy's request throttling limits for models running in on-demand mode based on model demand and system capacity.
If you're unfamiliar with the on-demand mode, see Offered modes for models.
Dynamic Throttling Limit Change for On-Demand Mode
OCI Generative AI dynamically adjusts the request throttling limit for each active tenancy based on model demand and system capacity to optimize resource allocation and ensure fair access.
This change depends on the following factors:
- The current maximum throughput supported by the target model.
- Any unused system capacity at the time of change.
- Each tenancy's historical throughput usage and any specified override limits set for that tenancy.
Because of dynamic throttling, rate limits are undocumented and can change to meet system-wide demand.
Tip: Implement Back-off and Retry Strategies
Because of the dynamic throttling limit change, we recommend implementing a back-off strategy, which involves delaying requests after a rejection. Without one, repeated rapid requests can lead to further rejections over time, increased latency, and potential temporary blocking of client by the Generative AI service.
By using a back-off strategy, such as an exponential back-off strategy, you can distribute requests more evenly, reduce load, and improve retry success, following industry best practices and enhancing the overall stability and performance of the integration to the service.