javaspring-bootgoogle-cloud-runlarge-files

Stream Large CSV File Generated in SpringBoot API to browser throwing broken pipe exception


I'm working on an endpoint that queries a datasource and generates a large CSV file and streams it to the browser. I'm having an issue when the stream has sent ~2Gb through the stream this error is thrown in our Cloud Run logs ( the API is a springboot service running in Cloud Run in GCP)

logger: "org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/].[dispatcherServlet]"
message: "Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed: java.lang.RuntimeException: org.springframework.web.context.request.async.AsyncRequestNotUsableException: ServletOutputStream failed to write: java.io.IOException: Broken pipe] with root cause"
stack_trace: "java.io.IOException: Broken pipe
    at java.base/sun.nio.ch.FileDispatcherImpl.write0(Native Method)
    at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:62)
    at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:132)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:97)
    at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:53)
    at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:532)
    at org.apache.tomcat.util.net.NioChannel.write(NioChannel.java:122)
    at org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doWrite(NioEndpoint.java:1378)
    at org.apache.tomcat.util.net.SocketWrapperBase.doWrite(SocketWrapperBase.java:764)
    at org.apache.tomcat.util.net.SocketWrapperBase.writeBlocking(SocketWrapperBase.java:589)
    at org.apache.tomcat.util.net.SocketWrapperBase.write(SocketWrapperBase.java:533)
    at org.apache.coyote.http11.Http11OutputBuffer$SocketOutputBuffer.doWrite(Http11OutputBuffer.java:533)
    at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:112)
    at org.apache.coyote.http11.Http11OutputBuffer.doWrite(Http11OutputBuffer.java:193)
    at org.apache.coyote.Response.doWrite(Response.java:622)

I can recreate this issue by calling this dummy endpoint:

@GetMapping("/searches/requests/large-file-gzip")
public void exportGzip(HttpServletResponse response) {

    response.setHeader(HttpHeaders.CONTENT_ENCODING, "gzip");
    response.setContentType(MediaType.TEXT_PLAIN_VALUE);
    response.setCharacterEncoding(StandardCharsets.UTF_8.displayName());
    response.setHeader(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=gzipped.csv");
    response.setStatus(200);

    try (GZIPOutputStream gzipOutputStream = new GZIPOutputStream(response.getOutputStream())) {

        for (int i = 0; i < 3500000; i++) {
            StringBuilder sb = new StringBuilder();
            for (int j = 0; j < 24; j++) {
                sb.append(UUID.randomUUID());
                sb.append(j == 23 ? "\n" : ",");
            }
            gzipOutputStream.write(sb.toString().getBytes(StandardCharsets.UTF_8));
        }
        gzipOutputStream.finish();
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

I've tested calling this endpoint directly from the cloud run api url and by calling it through an external load balancer, both have the same behavior. The download is stopped before the whole file is streamed to the browser and I can open the partially downloaded file.

What should I look for here? The container isn't running out of memory, there are no OOM exceptions in the logs. Is there some default cloud run connection timeout I'm not finding in the docs? Or could this be a spring boot configuration thing?


Solution

  • Turns out this was caused by the cloud run service's request timeout setting. By default cloud run will terminate requests that take longer than 5 minutes. It didn't matter how large the response stream was, it was just a factor of time.

    I didn't discover this earlier because when cloud run timesout the request, it doesn't log a timeout error or message in the cloud run service's logs. It just kills the request and to your app running in the container it looks like the client just went away, or the network dropped.

    If you're getting broken pipe exceptions in your cloud run service's logs for long running requests be sure to check your service's request timeout setting, because again when it timesout nothing will be logged in your service's log.

    enter image description here