c++linuxgrpcgrpc-c++

Finishing active rpc's on server side when signal is handled with long-lived grpc streaming in C++


I'm using long-lived grpc streaming for sending notifications from a server to applications. A client subscribes for particular notifications and then the server sends them back to the client. When some server event occurs, clients receive an immediate notification on that event. This rpc will be active while the client is running.

If SIGINT or SIGTERM will handled, the server will be shut down like this:

  std::unique_ptr<grpc::Server> server(builder.BuildAndStart());

  while (!GetSignalToExit()) {
    std::this_thread::sleep_for(std::chrono::seconds(1));
  }

  server->Shutdown();

It works if all active rpc's was finished (I can detect that long-lived grpc streaming has been cancelled on client side and call Finish() on server side). But it doesn't work if any rpc's still active (client doesn't cancel rpc and I can not call Finish() on server side while client still subscribed). So I have a deadlock in Shutdown():

I0531 14:26:57.547300877  568002 call.cc:1833]               grpc_call_start_batch(call=0x626000000170, ops=0x7f6e190fa3e0, nops=1, tag=0x61c000010660, reserved=(nil))
I0531 14:26:57.547415151  568002 call.cc:1322]               ops[0]: SEND_MESSAGE ptr=0x61200002d4c0
I0531 14:26:57.547552138  568002 completion_queue.cc:701]    cq_end_op_for_next(cq=0x613000000580, tag=0x61c000010660, error=OK, done=0x7f6e1de91ed0, done_arg=0x61300001ff60, storage=0x61300001ffd0)
I0531 14:26:57.547627956  568002 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518018, tv_nsec: 547605372, clock_type: 1 }, reserved=(nil))
I0531 14:26:57.547659786  568007 completion_queue.cc:1083]   RETURN_EVENT[0x613000000580]: OP_COMPLETE: tag:0x61c000010660 OK
^CI0531 14:26:58.152863773  568005 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 152835249, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.152871623  568006 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 152850642, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.153774912  568003 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 153756833, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.540090834  568004 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 540060203, clock_type: 1 }, reserved=(nil))
I0531 14:26:58.548524228  567997 init.cc:171]                grpc_init(void)
I0531 14:26:58.548550608  567997 completion_queue.cc:521]    grpc_completion_queue_create_internal(completion_type=0, polling_type=0)
I0531 14:26:58.548559168  567997 server.cc:1507]             grpc_server_shutdown_and_notify(server=0x612000000f40, cq=0x613000000ac0, tag=0x7ffcd7c151a0)
I0531 14:26:58.548791593  567997 completion_queue.cc:1420]   grpc_completion_queue_shutdown(cq=0x613000000ac0)
I0531 14:26:58.548799653  567997 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000ac0, deadline=gpr_timespec { tv_sec: 9223372036854775807, tv_nsec: 0, clock_type: 0 }, reserved=(nil))
I0531 14:26:58.648809379  568002 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518019, tv_nsec: 648783006, clock_type: 1 }, reserved=(nil))
I0531 14:26:59.253870233  568006 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518020, tv_nsec: 253845351, clock_type: 1 }, reserved=(nil))
...

I'm trying to use Shutdown() with one second deadline, and I see that all rpc's should be forced cencelled with grpc_server_cancel_all_calls(), but it does not give any effect. It is still blocked in grpc_completion_queue_next().

I0531 14:31:22.912214984  568384 call.cc:1833]               grpc_call_start_batch(call=0x626000000170, ops=0x7f450d8f83e0, nops=1, tag=0x61c000010660, reserved=(nil))
I0531 14:31:22.912273535  568384 call.cc:1322]               ops[0]: SEND_MESSAGE ptr=0x6120000211c0
I0531 14:31:22.912376873  568384 completion_queue.cc:701]    cq_end_op_for_next(cq=0x613000000580, tag=0x61c000010660, error=OK, done=0x7f4513691ed0, done_arg=0x61300001ff60, storage=0x61300001ffd0)
I0531 14:31:22.912437076  568384 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518283, tv_nsec: 912413737, clock_type: 1 }, reserved=(nil))
I0531 14:31:22.912579665  568382 completion_queue.cc:1083]   RETURN_EVENT[0x613000000580]: OP_COMPLETE: tag:0x61c000010660 OK
^CI0531 14:31:23.520863003  568387 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 520838434, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.521851049  568383 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 521831363, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.521886850  568386 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 521868298, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.546587208  568385 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 546566221, clock_type: 1 }, reserved=(nil))
I0531 14:31:23.913261372  568377 init.cc:171]                grpc_init(void)
I0531 14:31:23.913377031  568377 completion_queue.cc:521]    grpc_completion_queue_create_internal(completion_type=0, polling_type=0)
I0531 14:31:23.913415879  568377 server.cc:1507]             grpc_server_shutdown_and_notify(server=0x612000000f40, cq=0x613000000ac0, tag=0x7fffbe9f7890)
I0531 14:31:23.914173773  568377 completion_queue.cc:1420]   grpc_completion_queue_shutdown(cq=0x613000000ac0)
I0531 14:31:23.914212184  568377 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000ac0, deadline=gpr_timespec { tv_sec: 1685518284, tv_nsec: 913225652, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.013877278  568384 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 13848147, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.622571986  568387 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 622518754, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.622796108  568386 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 622775445, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.623751461  568383 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 623732426, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.647893715  568385 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 647873697, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.913724663  568377 server.cc:1515]             grpc_server_cancel_all_calls(server=0x612000000f40)
I0531 14:31:24.914575277  568377 completion_queue.cc:701]    cq_end_op_for_next(cq=0x613000000580, tag=0x61a00001ffa0, error=OK, done=0x7f4513691ed0, done_arg=0x613000010360, storage=0x6130000103d0)
I0531 14:31:24.914759989  568387 completion_queue.cc:1083]   RETURN_EVENT[0x613000000580]: OP_COMPLETE: tag:0x61a00001ffa0 OK
I0531 14:31:24.914849649  568387 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518285, tv_nsec: 914838828, clock_type: 1 }, reserved=(nil))
I0531 14:31:24.915312144  568377 completion_queue.cc:701]    cq_end_op_for_next(cq=0x613000000ac0, tag=0x7fffbe9f7890, error=OK, done=0x7f45136b4e00, done_arg=0x612000000f40, storage=0x6060000192f0)
I0531 14:31:25.115158526  568384 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518286, tv_nsec: 115134038, clock_type: 1 }, reserved=(nil))
I0531 14:31:25.723760911  568386 completion_queue.cc:970]    grpc_completion_queue_next(cq=0x613000000580, deadline=gpr_timespec { tv_sec: 1685518286, tv_nsec: 723747543, clock_type: 1 }, reserved=(nil))
...
  1. Why Shutdown() with deadline doesn't cancel all active rpc's?
  2. How can I finish active rpc on server side when signal was handled?

I'm using grpc callback-based asynchronous API
protobuf 3.19.6
gRPC 1.48.4


Solution

  • gRPC logs were confusing me. It was not a deadlock in Shutdown(), it was deadlock on condition variable inside server.