Implementing retries with exponential backoff8 min read
Making a call to an external service might fail for a lot of reasons, sometimes it’s due to network failure, sometimes it’s because the external service is too busy, or even some problems in our business logic and many other possible scenarios.
When a failure is detected, there are multiple strategies to handle it, such as canceling the request, or in some cases, we don’t want to cancel but instead, we want to retry the request again until we get the desired result or until we cannot afford to send more requests. Retrying how many times and how long we have to wait before sending the next request depends on our business needs.
There is one common retry strategy to deal with this situation is retrying with exponential backoff, when a request fails, we don’t send the next request immediately, but alternatively, wait for some time which increases exponentially after each try, when a certain delay time is reached, we refrain from increasing more time, hence the word “backoff”.
The waiting time for each retry is simply expressed by the exponential function:
Where is the delay time, is the base or multiplicative factor, and
When should you retry?
Here are some reasons (not exhaustive) when to implement the retry strategy:
- When the context of failing is fully understood.
- Repeating previously failed requests could succeed in subsequent attempts.
- The cost of failing requests is greater than the retrying cost.
- The faults are expected to be short-lived.
- etc…
Implementation
We will create a simple retry with exponential backoff in Java. An HTTP client will send a request to an HTTP server, the request can succeed or fail, and when the request fails, we resend the request until the successful message arrives or when we cannot afford any more attempts.
First, we define the BackOffStrategy
interface, which has exactly one method get()
:
public interface BackoffStrategy {
/**
* Tests the supplier value with the provided predicate, if the predicate
* evaluates to false, then the supplier value is repeatedly provided to the predicate
* till it's evaluated to true or maximum number of attempts are reached, then
* return the supplier value wrapped in an Optional, which is potentially an empty optional.
*
* @param supplier the action to be performed that potentially can fail
* @param predicate the predicate to test the supplier value
* @param numAttempts maximum number of attempts to perform the supplier action
*
* @return the supplier value wrapped in an optional, possibly empty.
* @param <T> return type of the supplier value
*/
<T> Optional<T> get(Supplier<T> supplier, Predicate<T> predicate, int numAttempts);
}
The first parameter will hold the result that we want to get, in this case, we want to send some request to some endpoint and either receive a “SUCCESS” or “FAILURE” message, the second parameter tests the value provided by the first one, the third parameter indicates the maximum number of requests we want to make. Now, we have the ExponentialBackOffStrategy
:
public class ExponentialBackoffStrategy implements BackoffStrategy {
private static final Logger log = LoggerFactory.getLogger(ExponentialBackoffStrategy.class);
private final TimeDelayProvider timeDelayProvider;
public ExponentialBackoffStrategy(TimeDelayProvider timeDelayProvider) {
this.timeDelayProvider = timeDelayProvider;
}
@Override
public <T> Optional<T> get(Supplier<T> supplier, Predicate<T> predicate, int maxAttempts) {
T t = supplier.get();
int attempts = 1;
while (!predicate.test(t) && attempts < maxAttempts) {
try {
long time = timeDelayProvider.getDelay(attempts);
log.info("Predicate tested fail!! Retry in: {}ms", time);
Thread.sleep(time);
t = supplier.get();
} catch (Exception e) {
log.error("Fail to get the result: {}", e.getMessage(), e);
Thread.currentThread().interrupt();
return Optional.empty();
} finally {
++attempts;
}
}
log.info("Total requests have tried: {}", attempts);
return Optional.of(t);
}
}
Here we introduce one dependency to this class, which is TimDelayProvider
, it will give us the delay time for each subsequent request based on the exponential function that we have mentioned at the beginning:
public class ExponentialTimeDelayProvider implements TimeDelayProvider {
private final ThreadLocalRandom rand = ThreadLocalRandom.current();
private final int base;
private final long maxBackOffTime;
public ExponentialTimeDelayProvider(int base, long maxBackOffTime) {
this.base = base;
this.maxBackOffTime = maxBackOffTime;
}
@Override
public long getDelay(int noAttempts) {
double pow = Math.pow(base, noAttempts);
int extraDelay = rand.nextInt(1000);
return (long) Math.min(pow * 1000 + extraDelay, maxBackOffTime);
}
@Override
public long maxBackoff() {
return this.maxBackOffTime;
}
}
The getDelay
function will calculate the delay time by the given attempts in seconds, also, it adds some random extra delay to make sure not all requests will be getting sent at the same time. The maxBackOff
gives us the maximum delay time that a request has to wait.
Next, we create a sender that will send a request to the HTTP server (which we will create shortly), and the response will be a string of either “SUCCESS” or “FAILURE”:
public class RetrySender {
private static final Logger log = LoggerFactory.getLogger(RetrySender.class);
private static final String API_ENDPOINT = "http://127.0.0.1:8080/retry";
private static final CloseableHttpClient client = HttpClientBuilder.create().build();
private final BackoffStrategy backOffStrategy;
private final int maxAttempts;
public RetrySender(BackoffStrategy backOffStrategy, int maxAttempts) {
this.backOffStrategy = backOffStrategy;
this.maxAttempts = maxAttempts;
}
public String getStatus(String uuid, int succeedOn) {
String endpoint = String.format("%s?uuid=%s&succeedWhen=%d", API_ENDPOINT, uuid, succeedOn);
Optional<String> maybeResponse = backOffStrategy.get(() -> sendRequest(endpoint), "SUCCESS"::equals, maxAttempts);
return maybeResponse.orElse("FAILURE");
}
private String sendRequest(String endpoint) {
HttpGet httpGet = new HttpGet(endpoint);
try (CloseableHttpResponse execute = client.execute(httpGet)) {
InputStream is = execute.getEntity().getContent();
return new String(is.readAllBytes());
} catch (Exception e) {
log.error("Fail to execute request to: {}", endpoint, e);
return "FAILURE";
}
}
}
Notice that it takes the BackoffStrategy
that we created earlier, the getStatus
method takes 2 parameters, the first one is to identify the request that will be sent to the server, and the second one sucessWhen
is the total number of tries until the success, for example, if I put 3 there, then the success message will be returned after 3 tries.
Inside the method, we use the BackOffStrategy
, the predicate SUCCESS::equals
evaluates to true if the supplier returns “SUCCESS”, otherwise false. The sendRequest
method simply sends a request to our HTTP server and gets some string response.
Next, we create our HTTP server in which we use the existing HttpServer
class of the com.sun.net.httpserver
package:
public class SimpleHttpServer {
private final SimpleHttpHandler handler;
public SimpleHttpServer(SimpleHttpHandler handler) {
this.handler = handler;
}
public void run() throws IOException {
HttpServer server = HttpServer.create(new InetSocketAddress("127.0.0.1", 8080), 0);
server.createContext("/retry", handler);
server.start();
}
}
To handle the request from the RetrySender
client, the interface RequestHandler
has 2 methods:
public interface RequestHandler {
String handle(RetryRequest request);
int getTries(String uuid);
}
The handle
method will return either “SUCCESS” or “FAILURE” based on the RetryRequest
provided, while getTries
will give us the number of requests that have been made by the given uuid
:
public class RetryRequestHandler implements RequestHandler {
private final Map<String, Integer> idToRequests = new ConcurrentHashMap<>();
@Override
public String handle(RetryRequest request) {
String uuid = request.uuid();
idToRequests.merge(uuid, 1, Integer::sum);
int numRequests = idToRequests.get(uuid);
if (numRequests >= request.successWhen() && request.successWhen() != -1) {
return "SUCCESS";
} else {
return "FAILURE";
}
}
@Override
public int getTries(String uuid) {
if (this.idToRequests.get(uuid) == null) return 0;
return this.idToRequests.get(uuid);
}
}
For the handle
function, each time a new request arrives, we check whether its uuid is already existing, and update the map accordingly. To stimulate the request failure to test our retry logic, we send the “FAILURE” message when the total number of requests is less than the succeededOn
threshold (which is already explained) or when successWhen
is -1, otherwise returns “SUCCESS”.
Finally, we’re ready to write some tests to test our code, first, we define some constants and necessary dependencies for our code:
class ExponentialBackoffStrategyTest {
static int MAX_ATTEMPTS = 5;
static TimeDelayProvider timeDelayProvider = new ExponentialTimeDelayProvider(2, 10_000);
static BackoffStrategy exponentialBackoffStrategy = new ExponentialBackoffStrategy(timeDelayProvider);
static RetrySender retrySender = new RetrySender(exponentialBackoffStrategy, MAX_ATTEMPTS);
static RequestHandler requestHandler = new RetryRequestHandler();
static SimpleHttpHandler simpleHttpHandler = new SimpleHttpHandler(requestHandler);
static SimpleHttpServer simpleHttpServer = new SimpleHttpServer(simpleHttpHandler);
static RetryRequest succeededFirstTry = new RetryRequest(UUID.randomUUID().toString(), 1);
static RetryRequest succeededMoreTries = new RetryRequest(UUID.randomUUID().toString(), 3);
static RetryRequest failOnAllTries = new RetryRequest(UUID.randomUUID().toString(), -1);
static String SUCCESS = "SUCCESS";
static String FAILURE = "FAILURE";
@BeforeAll
static void init() throws IOException {
simpleHttpServer.run();
}
}
We create the first method to check the request is successful on the first try:
@Test
void testSuccessfulFirstTry() {
String status = retrySender.getStatus(succeededFirstTry.uuid(), succeededFirstTry.successWhen());
assertEquals(1, requestHandler.getTries(succeededFirstTry.uuid()));
assertEquals(SUCCESS, status);
}
The method runs and passes; we’re then provided with some logs like:
Total requests have tried: 1
Next, we create a request that will only be succeeded after 3 tries. For each failure, the delay time will be increased exponentially until the upper limit is reached, in our case, we set it for 10 seconds:
@Test
void testSuccessMoreTries() {
String status = retrySender.getStatus(succeededMoreTries.uuid(), succeededMoreTries.successWhen());
assertEquals(3, requestHandler.getTries(succeededMoreTries.uuid()));
assertEquals(SUCCESS, status);
}
The test passed and we get some logs:
Predicate tested fail!! Retry in: 2161ms
Predicate tested fail!! Retry in: 4785ms
Total requests have tried: 3
Finally, we test the case in which the request will never succeed, the same requests will be sent repeatedly 5 times (MAX_ATTEMPTS), and then return:
@Test
void testAllFails() {
String status = retrySender.getStatus(failOnAllTries.uuid(), failOnAllTries.successWhen());
assertEquals(MAX_ATTEMPTS, requestHandler.getTries(failOnAllTries.uuid()));
assertEquals(FAILURE, status);
}
The delay for each subsequent request seems doubled each time, and here again is the log for our last test method:
Predicate tested fail!! Retry in: 2987ms
Predicate tested fail!! Retry in: 4422ms
Predicate tested fail!! Retry in: 8447ms
Predicate tested fail!! Retry in: 10000ms
Total requests have tried: 5
The code example is provided here.