Retries

By default, operations exposed in the SDK do not retry, but retries can be set in the SDK on a per-operation basis. Each operation accepts a retry_strategy keyword argument which can be used to set the retry strategy for that operation. This retry strategy could be:

A sample on using retries, including the default strategy and a custom strategy, can be found on GitHub

Exponential Backoff

The client application must implement retries responsibly. If there are N clients retrying for the same resource the work done by service increases proportionally to N^2 as N clients retry in first round, N-1 in second and so on. This quadratic increase in workload can overwhelm the system and can cause further degradation of a service which might already be under stress. A common strategy to avoid this is to use exponential backoff. This strategy essentially makes the client wait progressively longer after each consecutive retry which is exponential in nature.

The wait interval with exponential backoff would be as below:-

exponential_backoff_sleep_base = min(base_time * (exponent ** attempt_number), max_wait_time)

Introducing Jitter

Exponential backoff solves the problem of overwhelming the service by spreading the retries over a longer interval of time; however, the N clients still retry in lockstep, albeit with retries spaced exponentially farther apart. To remove this synchronous behavior of the retrying clients we can add jitter, which adds randomness, to the wait interval helping these clients to avoid collision.

There are different strategies to implement these timed backoff delays as mentioned below.

Full Jitter

Instead of using a constant time we can instead use a random value between 0 and the exponential backoff time.

Exponential backoff with full jitter is used for other scenarios where we need to retry because of a failure (e.g. timeouts, HTTP 5xx). The sleep time in this circumstance is calculated as:

exponential_backoff_sleep_base = min(base_time * (exponent ** attempt_number), max_wait_time)
sleep_time = random(0, exponential_backoff_sleep_base)

Equal Jitter

In this strategy we keep some amount of the original backoff and jitter on smaller amount. The intuition behind this it to avoid short sleep scenarios which can again lead to overwhelming the service.

Exponential backoff with equal jitter is used for throttles as this guarantees some sleep time between attempts. The sleep time in this circumstance is calculated as:

exponential_backoff_sleep_base = min(base_time * (exponent ** attempt_number), max_wait_time)
sleep_time = (exponential_backoff_sleep_base / 2.0) + random(0, exponential_backoff_sleep_base / 2.0)

De-correlated Jitter

In this strategy we keep some amount of De-correlated Jitter (default 1 second) to reduce the number of collisions between the subsequent retrying calls.

exponential_backoff_sleep_base = base_time * (exponent ** (attempt_number - 1))
sleep_time = min(exponential_backoff_sleep_base + random.uniform(0, decorrelated_jitter), max_wait_between_calls_seconds)

Default Retry Strategy

The default retry strategy vended by the SDK has the following attributes:

  • 8 total attempts

  • Total allowed elapsed time for all requests of 600 seconds (10 minutes)

  • Exponential backoff with de-correlated jitter of 1000 ms, using:

    • The base time to use in retry calculations will be 1 second
    • An exponent of 2. When calculating the next retry time we will raise this to the power of the number of attempts
    • A maximum wait time between calls of 30 seconds
  • Retries on the following exception types:

    • Timeouts and connection errors
    • HTTP 409 (IncorrectState)
    • HTTP 429s (throttles)
    • HTTP 5xx (server errors), except 501

Customizing Retry Strategy

As mentioned above, users can create there own custom retry strategy using RetryStrategyBuilder class.

An example for this is below:-

custom_retry_strategy = oci.retry.RetryStrategyBuilder(
    # Make up to 10 service calls
    max_attempts_check=True,
    max_attempts=10,

    # Don't exceed a total of 600 seconds for all service calls
    total_elapsed_time_check=True,
    total_elapsed_time_seconds=600,

    # Wait 45 seconds between attempts
    retry_max_wait_between_calls_seconds=45,

    # Use 2 seconds as the base number for doing sleep time calculations
    retry_base_sleep_time_seconds=2,

    # Retry on certain service errors:
    #
    #   - 5xx code received for the request
    #   - Any 429 (this is signified by the empty array in the retry config)
    #   - 400s where the code is QuotaExceeded or LimitExceeded
    service_error_check=True,
    service_error_retry_on_any_5xx=True,
    service_error_retry_config={
        400: ['QuotaExceeded', 'LimitExceeded'],
        429: []
    },

    # Use exponential backoff and retry with full jitter, but on throttles use
    # exponential backoff and retry with equal jitter
    backoff_type=oci.retry.BACKOFF_FULL_JITTER_EQUAL_ON_THROTTLE_VALUE
).get_retry_strategy()

Overriding the Retry behavior at Operation Level

To use a custom retry strategy for an operation, a custom retry strategy can be passed through the retry_strategy keyword argument.

An Example would be:-

# Default config file and profile
config = oci.config.from_file()
compartment_id = config["tenancy"]

# Service client
identity_client = oci.identity.IdentityClient(config)

# Operation Retry Strategy override
response = identity_client.list_region_subscriptions(compartment_id, retry_strategy=custom_retry_strategy)

# For convenience the Default Retry Strategy vended by the SDK can also be used here
response = identity_client.list_region_subscriptions(compartment_id, retry_strategy=oci.retry.DEFAULT_RETRY_STRATEGY)

To disable retries at Operation level you can use:-

response = identity.list_region_subscriptions(compartment_id, retry_strategy=oci.retry.NoneRetryStrategy())

Overriding the Retry behavior at Client Level

To use a custom retry strategy for all operations for client, a custom retry strategy can be passed through the retry_strategy keyword argument while initializing the client

An Example would be:-

# Default config file and profile
config = oci.config.from_file()
compartment_id = config["tenancy"]

# Service client that uses custom retry strategy for all operations
identity_client = oci.identity.IdentityClient(config, retry_strategy=custom_retry_strategy)

# For convenience the Default Retry Strategy vended by the SDK can also be used here
identity_client = oci.identity.IdentityClient(config, retry_strategy=oci.retry.DEFAULT_RETRY_STRATEGY)

To disable retries at the client level:-

identity_client = oci.identity.IdentityClient(config, retry_strategy=oci.retry.NoneRetryStrategy())

Overriding the Retry behavior at Global/SDK Level

To override the SDK level global retries for service client operations programmatically, a retry strategy can be passed to the variable GLOBAL_RETRY_STRATEGY. This retry strategy can be:

The python SDK also provides a handy way of enabling/disabling retries at global level using environment variables.

# Set the following environment variable to False
OCI_SDK_DEFAULT_RETRY_ENABLED=False

# Setting the environment variable to True will enable retries with DEFAULT_RETRY_STRATEGY
OCI_SDK_DEFAULT_RETRY_ENABLED=True

Retry Behavior Precedence

The Retry behavior Precedence in Python SDK (Highest to lowest) is defined as below:-

  • Operation level retry strategy
  • Client level retry strategy
  • Global level retry strategy set using oci.retry.GLOBAL_RETRY_STRATEGY
  • Environment level override via the OCI_SDK_DEFAULT_RETRY_ENABLED environment variable

Note

Some services can enable retries for operations by default which would follow the oci.retry.DEFAULT_RETRY_STRATEGY. This can be overridden using any alternatives mentioned above. To know which service operations have retries enabled by default, look at the operation’s description in the SDK - it will say either that it has retries enabled by default, or that it does not have retries enabled by default.