I am using gRPC 1.55.1 and observing an issue similar to the one discussed below
I have set the KeepAlive as below on the client side like below
var channel = ManagedChannelBuilder.forAddress(network.getIp(), network.getPort())
.keepAliveTime(130, TimeUnit.SECONDS)
.maxInboundMessageSize(maxInboundMessageSize)
.maxInboundMetadataSize(maxInboundMetadataSize)
.enableRetry()
.build();
var stub = HelloServiceGrpc.newBlockingStub(channel).withDeadline(Deadline.after(115, TimeUnit.SECONDS));
stub.sayHello();
stub.sayHello();
In the server side also keepAliveTime
is set as suggested in the above GitHub issue.
Grpc.newServerBuilderForPort(port, InsecureServerCredentials.create())
.addService(new GreeterImpl())
.keepAliveTime(130, TimeUnit.SECONDS)
.build()
.start();
In my case client calls server1
then server1
acts as a client to server2
.
I am observing that when the deadline is exceeded in server2
, client in server1
receives an error like below which the final client receives as expected
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: context timed out
at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:271)
at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:252)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:165)
But in some rare cases, the client thread hangs and did not receive the deadline exceeded from the server. The client thread hangs like below
at jdk.internal.misc.Unsafe.park(java.base@17.0.9/Native Method)
- parking to wait for <0x0000000767a53a00> (a io.grpc.stub.ClientCalls$ThreadlessExecutor)
at java.util.concurrent.locks.LockSupport.park(java.base@17.0.9/LockSupport.java:211)
at io.grpc.stub.ClientCalls$ThreadlessExecutor.waitAndDrain(ClientCalls.java:748)
at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:157)
I waited for about 2 hours & it did not recover. The only way to recover from this is to restart the client application. I observed this issue 3-4 times in last 2 months.
Can someone let me know
Most likely you are hitting this bug: https://github.com/grpc/grpc-java/issues/10838
The fix is planned for 1.63.
As a workaround, disabling the retry should help.