I tried to run my C-program on my local kubernetes-cluster and on the first look all functions are working as I expect. Only my signal-handler for terminating one process doesn't work completely.
There are two processes, one for administration, one for the "work". The administration-process sends a SIGTERM to the worker-process and the signal-handler also catch this signal (I can see it in my logs). The only thing that does not work: terminating the worker-process.
Here the signal-handler:
void handle_sigterm() {
error_msg(sip_man_log, "(MAIN) INFO: Closing all sockets.");
close(connfd);
close(sockfd);
close(sockfd_ext);
free_manipulation_table(int_modification_table);
free_manipulation_table(ext_modification_table);
free_manipulation_table(mir_modification_table);
error_msg(sip_man_log, "(MAIN) INFO: Terminating Server by SIGTERM");
exit(0);}
In the logs the last message is visible but the process is still active so I think the "exit(0);" does not work correctly (on my mind the "exit(0)" should terminate the complete process).
If I try the same code on my local machine it works as I expect. I'm relative new in C and Unix-programming so I don't understand whats wrong here.
I found a solution for my problem. I think this was a kubernetes-specific thing. After some research I found out that the worker-process get the status Zombie after sending the SIGTERM and the parent-process was not able to clean up the Zombie. So after adding tini as init-process (on dockerfile and deployment.yaml) and make sure it gets PID1 now the zombies are cleaned up and the worker-process is terminating when I send the SIGTERM from the admin-process.
I also tried to improve my signal-handler to make it safe again (thanks for your information in the comments). Now here the updated signal-handler and some parts of the rest of the program:
volatile atomic_int execute_loop = 1;
void handle_sigterm() {
error_msg(sip_man_log, "(MAIN) INFO: Signal reached.");
execute_loop = 0;
error_msg(sip_man_log, "(MAIN) INFO: Closing all sockets.");
close(sockfd_ext);
close(connfd);
close(sockfd);
}
int main()
{
struct sockaddr_in sockaddr, connaddr, sockaddr_ext;
unsigned int connaddr_len;
char buffer[8192];
int rv, rv_ext;
signal(SIGTERM, handle_sigterm);
sockfd = socket(AF_INET, SOCK_STREAM, 0);
[...]
while(execute_loop)
{
connaddr_len = sizeof(connaddr);
connfd = accept(sockfd, (struct sockaddr*)&connaddr, &connaddr_len);
[...]
}
free_manipulation_table(int_modification_table);
free_manipulation_table(ext_modification_table);
free_manipulation_table(mir_modification_table);
error_msg(sip_man_log, "(MAIN) INFO: Terminating Server by SIGTERM");
return 0;
}
It's very shortened and I skipped most of the error-handling. Thanks for your help!
Dennis