javaaws-lambdaapache-httpclient-4.xfirecracker

How do I safely resume with HTTP clients after Coordinated Restore at Checkpoint?


"Serverless" infrastructure like AWS lambda makes use of Coordinated Restore at Checkpoint to improve startup time of java programs.

AWS documentation states that

The state of connections that your function establishes during the initialization phase isn't guaranteed when Lambda resumes your function from a snapshot. Validate the state of your network connections and re-establish them as necessary. In most cases, network connections that an AWS SDK establishes automatically resume. For other connections, review the best practices.

Spring docs mention

Leveraging checkpoint/restore of a running application typically requires additional lifecycle management to gracefully stop and start using resources like files or sockets and stop active threads.

I am wondering what I need to do when using HttpClient from the standard library or CloseableHttpClient from Apache to deal with this.
Let's say I am performing an HTTP request before the snapshot to perform client priming. What do I need to do in the afterRestore hook to avoid any network related problems?

@Override
public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
    var response = performPrimingRequest(httpClient);
    System.out.println(response.statusCode());
}

A connection that was established will be closed, and the destination IP might not be valid anymore. So I assume recreate the client or at least clear the connection pool. Is this possible with the standard JavaClient? Anything else required?


Solution

  • I am putting this answer as research result, I havent used CRaC before.

    CRaC looks like a very thin layer and from this line I understand that http client or any thread will get a nudge to go on.

    I would suggest to put a retry logic. After restore, probably the http connection will hang first then get timeout but in second try you may get a response. This connection may require authentication that will need another refresh on other parts but you got the picture.

    On the other hand Micronaut looks like having more investment on CRaC. And for spring boot you can check this demo.

    So I assume recreate the client or at least clear the connection pool. Is this possible with the standard JavaClient? Anything else required?

    Yes probably a good refresh will be required after restore. You can use actuators like the doc says and here is another sample which helps about db connections in spring.