Handle Connection Exceptions with Retries

Sometimes when a cache service is scaled in, scaled out, or restarted, it can’t accept connections for a brief time. For a Java application that attempts to connect, a java.netConnectionException or java.net.UnknownHostException might occur. You can code a retry mechanism to handle these exceptions.

This retry mechanism should control the number of retries attempted before giving up and any thread sleep time between attempts.

The following is a simplified example of a retry mechanism written in Java to serve as a guide. It employs a helper class, RetryOnException.

/**
 * Encapsulates retry-on-exception operations
 */
public class RetryOnException {
    public static final int DEFAULT_RETRIES = 30;
    public static final long DEFAULT_TIME_TO_WAIT_MS = 2000;

    private int numRetries;
    private long timeToWaitMS;

    // CONSTRUCTORS
    public RetryOnException(int _numRetries,
                            long _timeToWaitMS)
    {
        numRetries = _numRetries;
        timeToWaitMS = _timeToWaitMS;
    }

    public RetryOnException()
    {
        this(DEFAULT_RETRIES, DEFAULT_TIME_TO_WAIT_MS);
    }

    /**
     * shouldRetry
     * Returns true if a retry can be attempted.
     * @return  True if retries attempts remain; else false
     */
    public boolean shouldRetry()
    {
        return (numRetries >= 0);
    }

    /**
     * waitUntilNextTry
     * Waits for timeToWaitMS. Ignores any interrupted exception
     */
    public void waitUntilNextTry()
    {
        try {
            Thread.sleep(timeToWaitMS);
        }
        catch (InterruptedException iex) { }
    }

    /**
     * exceptionOccurred
     * Call when an exception has occurred in the block. If the
     * retry limit is exceeded, throws an exception.
     * Else waits for the specified time.
     * @throws Exception
     */
    public void exceptionOccurred() throws Exception
    {
        numRetries--;
        if(!shouldRetry())
        {
            throw new Exception("Retry limit exceeded.");
        }
        waitUntilNextTry();
    }
}

Here is a Java method, getWithRetries, that illustrates how to wrap a REST GET operation in the retry handler. CACHE_NAME is a String variable that holds the name of the cache to access. You should tune the retries and retrySleep parameter values for your application. Testing suggests that connectivity exceptions occur in a window of about 5 seconds at most, so reasonable starting values for retries and retrySleep might be 20 and 500, respectively.

    /**
     * getWithRetries
     * Issues a REST GET with retries. Throws Exception if the
     * GET could not succeed after retries attempts.
     * If the GET was successful, but returned anything other than an HTTP 200
     * status, return null;
     * If successful, returns the value.
     * @param target     WebTarget for the REST call
     * @param key        The cache key to fetch
     * @param retries    Number of retries to attempt
     * @param retrySleep Sleep time in milliseconds between each attempt
     * @return  String value, or null if the GET did not return an HTTP 200 code
     * @throws Exception
     */
    public String getWithRetries(WebTarget target,
                                 String key,
                                 int retries,
                                 long retrySleep) throws Exception
    {
        Response getResponse = null;
        boolean success = false;
        int getStatus;

        // For handling retries
        RetryOnException retryHandler = new RetryOnException(retries, retrySleep);

        while(true) {
            try {
                getResponse = target
                        .path(CACHE_NAME + "/" + key)
                        .request(MediaType.APPLICATION_OCTET_STREAM)
                        .get();
            }
            // Catch exception and retry.
            // If beyond retry limit, this will throw an exception.
            catch (Exception ex)
            {
                retryHandler.exceptionOccurred();
                continue;
            }

            // If the status is not a 200, return a NULL.
            // Otherwise, exit the loop to return the value.
            getStatus = getResponse.getStatus();
            if(getStatus != 200)
            {
                return null;
            }
            else
            {
                break;
            }
        }

        // Return the result
        return getResponse.readEntity(String.class);
    }