Find out size of UDP datagram BEFORE reading it into buffer?

I would like to receive whole UDP datagrams of arbitrary size and put the bytes into a buffer. However, before I allocate the buffer, I need to find out the size of the datagram. Unfortunately, I cannot find an example of this being done anywhere, no do any answers on StackOverflow seem to address this fully.

The best I could come up with is this:

sockaddr_in senderAddress;
memset(&senderAddress, 0, sizeof(senderAddress));
socklen_t senderAddressLen = sizeof(senderAddress);
ssize_t datagramLength = recvfrom(udpSocket, nullptr, 0, MSG_PEEK, reinterpret_cast<sockaddr*>(&senderAddress), &senderAddressLen);

Unfortunately, this always returns zero, even when recvfrom() with an actual provided buffer would return nonzero bytes.

It occurred to me that maybe the length argument to recvfrom() might be the problem, but changing it to a larger value just causes recvfrom() to return an error, specifically "Bad address". This is probably due to the fact that the data pointer being passed is null.

There are other possible ways do to this, but they seem inconsistent across systems as to what they return. Consider the last comment here.

There's another question where someone asked about this (Receive an entire UDP datagram, regardless of size?), but the explanation is very brief, and it doesn't work for me, even if I include MSG_TRUNC.

Another question (know udp packet size with poll select or epoll) mentions using MSG_PEEK, but it doesn't fully address the question, and it seems to assume you want to peek at the data.

I don't actually want to peek at the data. I only want to find out the length then allocate a buffer then read the data into it. Is there a way to do this?

Thanks!

Note: Although I plan to run this on Linux, I'm doing my prototyping on a Mac. MSG_TRUNC seems to be defined, but it doesn't seem to do anything.

Solution

Find out size of UDP datagram BEFORE reading it into buffer?

POSIX does not define any way to do this. You present and refer to potential techniques using ioctl(), but POSIX does not define any ioctls other than for STREAMS files (not to be confused with C streams or stream-oriented socket protocols). You also refer to a technique involving passing the MSG_TRUNC flag to recv() or recvfrom(), but

MSG_TRUNC is not documented by POSIX, and
Even if it gives access to the untruncated size, as it does on Linux, you should expect it to return that together with the (possibly truncated) message, so not before reading the message into the buffer.
- But if recv() / recvfrom() accepts it at all, you should be able to combine MSG_TRUNC with MSG_PEEK to avoid consuming the message, so that you can make a second round trip to receive the whole message into a large enough buffer. This will work on Linux, but not necessarily on other POSIX platforms.

Overall, there is really little call for a capability such as you request because in general, UDP applications have built-in, application-specific upper bounds on the datagram size they use. So NOT

receive whole UDP datagrams of arbitrary size

, which is not really a thing.

At worst, the maximum payload size the UDP protocol is capable of representing is slightly less than 64 KiB, so if you don't want to impose any other limit then you can at least rely on that.

before I allocate the buffer, I need to find out the size of the datagram

If by "allocate" you mean dynamically allocate, then you would probably be better off just using a large-enough automatic array instead. Where "large enough" means sufficient to accommodate the largest datagram the application supports, which cannot exceed 64 KiB, but might be limited to something much less. There are cases where you might want to make a dynamically allocated copy, but even then, the extra allocation and copy might still outperform an extra syscall.

Note: Although I plan to run this on Linux, I'm doing my prototyping on a Mac. MSG_TRUNC seems to be defined, but it doesn't seem to do anything.

If you want something that works on both Mac and Linux then you probably do want to limit yourself to features specified by POSIX. That very much puts you in the "[no] define[d] way to do this" category. As described above, your best bet is to receive datagrams into a buffer large enough to accommodate any valid message a remote peer could send.

If you need to sacrifice speed for minimal memory consumption then you could dynamically adapt to incoming message size by receiving with MSG_PEEK. If the buffer were completely filled then you would try again with a larger buffer. Ultimately, you would clear the message from the queue via a call without MSG_PEEK. But note that no variation on dynamic adaptation saves you any memory in practice if the application actually does receive a maximum-size datagram, so I'm not sure this is a problem worth solving.