Implementing retries with exponential backoff

Making a call to an external service might fail for a lot of reasons, sometimes it’s due to network failure, sometimes it’s because the external service is too busy, or even some problems in our business logic and many other possible scenarios.

When a failure is detected, there are multiple strategies to handle it, such as canceling the request, or in some cases, we don’t want to cancel but instead, we want to retry the request again until we get the desired result or until we cannot afford to send more requests. Retrying how many times and how long we have to wait before sending the next request depends on our business needs.

There is one common retry strategy to deal with this situation is retrying with exponential backoff, when a request fails, we don’t send the next request immediately, but alternatively, wait for some time which increases exponentially after each try, when a certain delay time is reached, we refrain from increasing more time, hence the word “backoff”.

The waiting time for each retry is simply expressed by the exponential function:

$t\:=\:b^c$

Where $t$ is the delay time, $b$ is the base or multiplicative factor, and $c$ value increases each time the failure happens. For example, let the time delay for each request be in seconds, we use base 2 for $b$ and after 3 times trying, we have to wait $2^3$ seconds before sending the fourth request.

When should you retry?

Here are some reasons (not exhaustive) when to implement the retry strategy:

When the context of failing is fully understood.
Repeating previously failed requests could succeed in subsequent attempts.
The cost of failing requests is greater than the retrying cost.
The faults are expected to be short-lived.
etc…

Implementation

We will create a simple retry with exponential backoff in Java. An HTTP client will send a request to an HTTP server, the request can succeed or fail, and when the request fails, we resend the request until the successful message arrives or when we cannot afford any more attempts.

First, we define the BackOffStrategy interface, which has exactly one method get():

public interface BackoffStrategy {
    /**
     * Tests the supplier value with the provided predicate, if the predicate 
     * evaluates to false, then the supplier value is repeatedly provided to the predicate
     * till it's evaluated to true or maximum number of attempts are reached, then 
     * return the supplier value wrapped in an Optional, which is potentially an empty optional.
     * 
     * @param supplier the action to be performed that potentially can fail
     * @param predicate the predicate to test the supplier value
     * @param numAttempts maximum number of attempts to perform the supplier action
     *                    
     * @return the supplier value wrapped in an optional, possibly empty.
     * @param <T> return type of the supplier value 
     */
    <T> Optional<T> get(Supplier<T> supplier, Predicate<T> predicate, int numAttempts);
}

The first parameter will hold the result that we want to get, in this case, we want to send some request to some endpoint and either receive a “SUCCESS” or “FAILURE” message, the second parameter tests the value provided by the first one, the third parameter indicates the maximum number of requests we want to make. Now, we have the ExponentialBackOffStrategy:

public class ExponentialBackoffStrategy implements BackoffStrategy {
    private static final Logger log = LoggerFactory.getLogger(ExponentialBackoffStrategy.class);
    private final TimeDelayProvider timeDelayProvider;

    public ExponentialBackoffStrategy(TimeDelayProvider timeDelayProvider) {
        this.timeDelayProvider = timeDelayProvider;
    }

    @Override
    public <T> Optional<T> get(Supplier<T> supplier, Predicate<T> predicate, int maxAttempts) {
        T t = supplier.get();
        int attempts = 1;
        while (!predicate.test(t) && attempts < maxAttempts) {
            try {
                long time = timeDelayProvider.getDelay(attempts);
                log.info("Predicate tested fail!! Retry in: {}ms", time);
                Thread.sleep(time);
                t = supplier.get();
            } catch (Exception e) {
                log.error("Fail to get the result: {}", e.getMessage(), e);
                Thread.currentThread().interrupt();
                return Optional.empty();
            } finally {
                ++attempts;
            }
        }
        log.info("Total requests have tried: {}", attempts);
        return Optional.of(t);
    }
}

Here we introduce one dependency to this class, which is TimDelayProvider, it will give us the delay time for each subsequent request based on the exponential function that we have mentioned at the beginning:

public class ExponentialTimeDelayProvider implements TimeDelayProvider {
    private final ThreadLocalRandom rand = ThreadLocalRandom.current();
    private final int base;
    private final long maxBackOffTime; 

    public ExponentialTimeDelayProvider(int base, long maxBackOffTime) {
        this.base = base;
        this.maxBackOffTime = maxBackOffTime;
    }

    @Override
    public long getDelay(int noAttempts) {
        double pow = Math.pow(base, noAttempts);
        int extraDelay = rand.nextInt(1000);
        return (long) Math.min(pow * 1000 + extraDelay, maxBackOffTime);
    }

    @Override
    public long maxBackoff() {
        return this.maxBackOffTime;
    }
}

The getDelay function will calculate the delay time by the given attempts in seconds, also, it adds some random extra delay to make sure not all requests will be getting sent at the same time. The maxBackOff gives us the maximum delay time that a request has to wait.

Next, we create a sender that will send a request to the HTTP server (which we will create shortly), and the response will be a string of either “SUCCESS” or “FAILURE”:

public class RetrySender {
    private static final Logger log = LoggerFactory.getLogger(RetrySender.class);
    private static final String API_ENDPOINT = "http://127.0.0.1:8080/retry";

    private static final CloseableHttpClient client = HttpClientBuilder.create().build();
    private final BackoffStrategy backOffStrategy;
    private final int maxAttempts;

    public RetrySender(BackoffStrategy backOffStrategy, int maxAttempts) {
        this.backOffStrategy = backOffStrategy;
        this.maxAttempts = maxAttempts;
    }

    public String getStatus(String uuid, int succeedOn) {
        String endpoint = String.format("%s?uuid=%s&succeedWhen=%d", API_ENDPOINT, uuid, succeedOn);
        Optional<String> maybeResponse = backOffStrategy.get(() -> sendRequest(endpoint), "SUCCESS"::equals, maxAttempts);
        return maybeResponse.orElse("FAILURE");
    }

    private String sendRequest(String endpoint) {
        HttpGet httpGet = new HttpGet(endpoint);
        try (CloseableHttpResponse execute = client.execute(httpGet)) {
            InputStream is = execute.getEntity().getContent();
            return new String(is.readAllBytes());
        } catch (Exception e) {
            log.error("Fail to execute request to: {}", endpoint, e);
            return "FAILURE";
        }
    }
}

Notice that it takes the BackoffStrategy that we created earlier, the getStatus method takes 2 parameters, the first one is to identify the request that will be sent to the server, and the second one sucessWhen is the total number of tries until the success, for example, if I put 3 there, then the success message will be returned after 3 tries.

Inside the method, we use the BackOffStrategy, the predicate SUCCESS::equals evaluates to true if the supplier returns “SUCCESS”, otherwise false. The sendRequest method simply sends a request to our HTTP server and gets some string response.

Next, we create our HTTP server in which we use the existing HttpServer class of the com.sun.net.httpserver package:

public class SimpleHttpServer {
    private final SimpleHttpHandler handler;
    
    public SimpleHttpServer(SimpleHttpHandler handler) {
        this.handler = handler;
    }
    
    public void run() throws IOException {
        HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 8080), 0);
        server.createContext("/retry", handler);
        server.start();
    }
}

To handle the request from the RetrySender client, the interface RequestHandler has 2 methods:

public interface RequestHandler {
    String handle(RetryRequest request);
    int getTries(String uuid);
}

The handle method will return either “SUCCESS” or “FAILURE” based on the RetryRequest provided, while getTries will give us the number of requests that have been made by the given uuid:

public class RetryRequestHandler implements RequestHandler {
    private final Map<String, Integer> idToRequests = new ConcurrentHashMap<>();
    
    @Override
    public String handle(RetryRequest request) {
        String uuid = request.uuid();
        idToRequests.merge(uuid, 1, Integer::sum);
        int numRequests = idToRequests.get(uuid);
        if (numRequests >= request.successWhen() && request.successWhen() != -1) {
            return "SUCCESS";
        } else {
            return "FAILURE";
        }
    }

    @Override
    public int getTries(String uuid) {
        if (this.idToRequests.get(uuid) == null) return 0;
        return this.idToRequests.get(uuid);
    }
}

For the handle function, each time a new request arrives, we check whether its uuid is already existing, and update the map accordingly. To stimulate the request failure to test our retry logic, we send the “FAILURE” message when the total number of requests is less than the succeededOn threshold (which is already explained) or when successWhen is -1, otherwise returns “SUCCESS”.

Finally, we’re ready to write some tests to test our code, first, we define some constants and necessary dependencies for our code:

class ExponentialBackoffStrategyTest {

    static int MAX_ATTEMPTS = 5;
    
    static TimeDelayProvider timeDelayProvider = new ExponentialTimeDelayProvider(2, 10_000);
    static BackoffStrategy exponentialBackoffStrategy = new ExponentialBackoffStrategy(timeDelayProvider);
    static RetrySender retrySender = new RetrySender(exponentialBackoffStrategy, MAX_ATTEMPTS);
    static RequestHandler requestHandler = new RetryRequestHandler();
    static SimpleHttpHandler simpleHttpHandler = new SimpleHttpHandler(requestHandler);
    static SimpleHttpServer simpleHttpServer = new SimpleHttpServer(simpleHttpHandler);


    static RetryRequest succeededFirstTry = new RetryRequest(UUID.randomUUID().toString(), 1);
    static RetryRequest succeededMoreTries = new RetryRequest(UUID.randomUUID().toString(), 3);
    static RetryRequest failOnAllTries = new RetryRequest(UUID.randomUUID().toString(), -1);

    static String SUCCESS = "SUCCESS";
    static String FAILURE = "FAILURE";
    
    @BeforeAll
    static void init() throws IOException {
        simpleHttpServer.run();
    }
}

We create the first method to check the request is successful on the first try:

 @Test
void testSuccessfulFirstTry() {
    String status = retrySender.getStatus(succeededFirstTry.uuid(), succeededFirstTry.successWhen());
    assertEquals(1, requestHandler.getTries(succeededFirstTry.uuid()));
    assertEquals(SUCCESS, status);
}

The method runs and passes; we’re then provided with some logs like:

Total requests have tried: 1

Next, we create a request that will only be succeeded after 3 tries. For each failure, the delay time will be increased exponentially until the upper limit is reached, in our case, we set it for 10 seconds:

@Test
void testSuccessMoreTries() {
    String status = retrySender.getStatus(succeededMoreTries.uuid(), succeededMoreTries.successWhen());
    assertEquals(3, requestHandler.getTries(succeededMoreTries.uuid()));
    assertEquals(SUCCESS, status);
}

The test passed and we get some logs:

Predicate tested fail!! Retry in: 2161ms
Predicate tested fail!! Retry in: 4785ms
Total requests have tried: 3

Finally, we test the case in which the request will never succeed, the same requests will be sent repeatedly 5 times (MAX_ATTEMPTS), and then return:

@Test
void testAllFails() {
    String status = retrySender.getStatus(failOnAllTries.uuid(), failOnAllTries.successWhen());
    assertEquals(MAX_ATTEMPTS, requestHandler.getTries(failOnAllTries.uuid()));
    assertEquals(FAILURE, status);
}

The delay for each subsequent request seems doubled each time, and here again is the log for our last test method:

Predicate tested fail!! Retry in: 2987ms
Predicate tested fail!! Retry in: 4422ms
Predicate tested fail!! Retry in: 8447ms
Predicate tested fail!! Retry in: 10000ms
Total requests have tried: 5

The code example is provided here.

Facebook0 Tweet0

Implementing retries with exponential backoff8 min read

When should you retry?

Implementation