cunixprotocol-buffersipcunix-socket

What is the correct way of passing data through Unix sockets?


I'm working on a personal project to try to better understand inter-process communications on Unix. I have two binaries i compiled in C and I am attempting to pass data from one process to another using Unix sockets.

I wanted to make my send/receive function as generic as possible to be able to pass ANY TYPE of data (int, char, complex structures) using the same message structure :

    enum DataType
{
    INT_TYPE,
    FLOAT_TYPE,
    CHAR_TYPE,
    STRUCT_TYPE,
};

struct Message
{
    int identifier;
    enum DataType data_type;
    void* data;
    size_t data_length;
};

This is the send function I came up with :

ssize_t Send_message(const int pSocket, struct Message pMessage)
{
    // Send the message over the socket
    ssize_t bytes_sent = send(pSocket, &pMessage, sizeof(struct Message), 0);

    if (bytes_sent == -1) 
    {
        perror("Error in ipc.c, Send_message: Error sending message");
        return -1;
    }

    if (bytes_sent != sizeof(struct Message)) 
    {
        fprintf(stderr, "Error in ipc.c, Send_message: Incomplete message sent\n");
        return -1;
    }

    if(pMessage.data_length > 0 && pMessage.data != NULL)
    {
        bytes_sent += send(pSocket, pMessage.data, pMessage.data_length, 0);

        if (bytes_sent == -1) 
        {
            perror("Error in ipc.c, Send_message: Error sending message");
            return -1;
        }

        if (bytes_sent != pMessage.data_length + sizeof(struct Message)) 
        {
            fprintf(stderr, "Error in ipc.c, Send_message: Incomplete message sent\n");
            return -1;
        }
    }
    
    printf("\nSent message with Request Type : %d, Identifier :%d, Data Lenght : %d \n", pMessage.request_type, pMessage.identifier, pMessage.data_length);

    return bytes_sent;
}

I though the best way to be as generic as possible is to cast the data I want to pass to a void* and then cast back to the correct type on the receiving end. Example sending process :

struct Message response;
// ** Input here response.identifier
// ** Input here response.data_type 
// ** Input here response.data_length

char *string_val = "HELLO WORLD";
int int_val = 42; 
if(received_message.data_type == CHAR_TYPE)
{
  response.data = (void*)string_val;                
}
if(received_message.data_type == INT_TYPE)
{
  response.data = (void*)&int_val ;                
}

Send_message(pSocket, response);

This works perfectly for basic types. But if i want to pass complex structures like :

typedef struct {
int subparam1;
float subparam2;
char * subparam3;
} SubConfiguration;

SubConfiguration subconf;
// ** Fill in the struct

response.data = (void*)&subconf;

Send_message(pSocket, response);

-- EDIT Adding Receive_message to receive the message

ssize_t Receive_message(const int pSocket, struct Message *pMessage)
{
    // Receive the message into the buffer
    ssize_t bytes_received = recv(pSocket, pMessage, sizeof(struct Message), 0);

    if (bytes_received != sizeof(struct Message)) 
    {
        perror("\n Error in ipc.c, Receive_message: Error receiving message");
        return -1;
    }

    if(pMessage->data_length > 0 )
    {
        pMessage->data = malloc(pMessage->data_length);
        bytes_received += recv(pSocket, pMessage->data, pMessage->data_length, 0);

        if (bytes_received != pMessage->data_length + sizeof(struct Message)) 
        {
            perror("\n Error in ipc.c, Receive_message: Error receiving message");
            return -1;
        }
    }

    printf("\nReceived message with Request Type : %d, Identifier :%d, Data Lenght : %d \n", pMessage->request_type, pMessage->identifier, pMessage->data_length);

    return bytes_received;
}

Now all i get on the receiving end are the int and float values of the structure. The char* I put in can't be accessed.

The question I have is : Is it possible to do what I am trying to do? and what am I doing wrong? I started to think about integrating Protocol Buffers like protobuf to serialize and deserialize my data correctly : Is that necessary in my Case?


Solution

  • "Modern" (that means basically every system that is more than just a microcontroller developed in the last 40 year or so) systems do have virtual memory. That means every process has its own virtual address ranges independent from other processes.

    If a process, lets call the process A, needs memory, process A has has to request it from the kernel (on unix the mmap() syscall can be used). The kernel then (or later, if lazy allocation is used) reserves physical memory for process A. Lets say the physical address starts at 0x12345600 but process A may not access it with a pointer to address 0x12345600 but with a virtual address, lets say it is address 0xABCDEC00. The CPU automatically translates the virtual address 0xABCDEC00 to the physical 0x12345600 for process A.

    Now, when process A sends the pointer to address 0xABCDEC00 to process B. When process B wants to access 0xABCDEC00, there is either no physical address mapped at that address for process B and causes a segment fault. Or process B did map something (else) at address 0xABCDEC00 and then this is accessed instead of the physical address 0x12345600 (causing unpredictable behaviour, this is why accessing this address in C causes UB).

    This is why void* data; in the receiver points to either nowhere or some unrelated data. This can not work.

    Maybe you read about virtual memory, address translation and MMU (memory management unit).

    How to avoid this:

    You could either write the data in the socket. That means all data you want to transmit are included in the write() or send() call.

    Or you could reserve shared memory (also with mmap()). If you do it correctly, you can then send pointers to that shared memory to process B and process B can access it.

    I wanted to make my send/receive function as generic as possible to be able to pass ANY TYPE of data (int, char, complex structures) using the same message structure :

    That is probably not the best idea since this adds a huge amount of complexity that you could avoid. Except you mean you use a stream of bytes (which is essentially what sockets, pipes and files are), which are very generic, but then you don't have to write any new functions since the already existing functions can do that.