csocketsftpcircular-bufferposix-select

Why should I use circular buffers when reading and writing to sockets in C?


I'm doing an assignment where the goal is to create a basic FTP server in C capable of handling multiple clients at once. The subject tells us to "wisely use circular buffers" but I don't really understand why or how ?

I'm already using select to know when I can read or write into my socket without blocking as I'm not allowed to use recv, send or O_NONBLOCKING.

Each connection has a structure where I store everything related to this client like the communication file descriptor, the network informations and the buffers.

Why can't I just use read on my socket into a fixed size buffer and then pass this buffer to the parsing function ?

Same goes for writing : why can't I just dprintf my response into the socket ?

From my point of view using a circular buffer adds a useless layer of complexity just to be translated back into a string to parse the command or to send back the response.

Did I misunderstood the subject ? Instead of storing individual characters should I store commands and responses as circular buffers of strings ?


Solution

  • Why should I use circular buffers when reading and writing to sockets in C?

    The socket interface does not itself provide a reason for using circular buffers (a.k.a. ring buffers). You should be looking instead at the protocol requirements of the application using the socket -- the FTP protocol in this case. This will be colored by the characteristics of the underlying network protocol (TCP for FTP) and their effect on the behavior of the socket layer.

    Why can't I just use read on my socket into a fixed size buffer and then pass this buffer to the parsing function ?

    You surely could do without circular buffers, but that wouldn't be as simple as you seem to suppose. And that's not the question you should be asking anyway: it's not whether circular buffers are required, but what benefit they can provide that you might not otherwise get. More on that later.

    Also, you surely can have fixed size circular buffers -- "circular" and "fixed size" are orthogonal characteristics. However, it is usually among the objectives of using a circular buffer to minimize or eliminate any need for dynamically adjusting the buffer size.

    Same goes for writing : why can't I just dprintf my response into the socket ?

    Again, you probably could do as you describe. The question is what do you stand to gain from interposing a circular buffer? Again, more later.

    From my point of view using a circular buffer adds a useless layer of complexity just to be translated back into a string to parse the command or to send back the response.

    Did I misunderstood the subject ?

    That you are talking about translating to and from strings makes me think that you did indeed misunderstand the subject.

    Instead of storing individual characters should I store commands and responses as circular buffers of strings ?

    Again, where do you think "of strings" comes into it? Why are you supposing that the elements of the buffer(s) would represent (whole) messages?

    A circular buffer is more a manner of use of an ordinary, flat, usually fixed-size buffer than it is a separate data structure of its own. There is a little bit of extra bookkeeping data involved, however, so I won't quibble with anyone who wants to call it a data structure in its own right.

    Circular buffers for input

    Among the main contexts for circular buffers' usefulness is data arriving with stream semantics (such as TCP provides) rather than with message semantics (such as UDP provides). With respect to your assignment, consider this: when the server reads command input, how does it know where the command ends? I suspect you're supposing that you will get one complete command per read(), but that is in no way a safe assumption, regardless of the implementation of the client. You may get partial commands, multiple commands, or both on each read(), and you need to be prepared to deal with that.

    So suppose, for example, that you receive one and a half control messages in one read(). You can parse and respond to the first, but you need to read more data before you can act on the second. Where do you put that data? Ok, you read it into the end of the buffer. And what if on the next read() you get not only the rest of a message, but also part of another message?

    You cannot keep on indefinitely adding data at the end of the buffer, not even if you dynamically allocate more space as needed. You could at some point move the unprocessed data from the tail of the buffer to the beginning, thus opening up space at the end, but that is costly, and at this point we are well past the simplicity you had in mind. (That simplicity was always imaginary.) Alternatively, you can perform your reads into a circular buffer, so that consuming data from the (logical) beginning of the buffer automatically makes space available at the (logical) end.

    Circular buffers for output

    Similar applies on the writing side with a stream-oriented network protocol. Consider that you cannot write() an arbitrary amount of data at a time, and it is very hard to know in advance exactly how much you can write. That's more likely to bite you on the data connection than on the control connection, but in principle, it applies to both. If you have only one client to feed at a time then you can keep write()ing in a loop until you've successfully transferred all the data, and this is what dprintf() would do. But that's potentially a blocking operation, so it undercuts your responsiveness when you are serving multiple clients at the same time, and maybe even with just one if (as with FTP) there are multiple connections per client.

    You need to buffer data on the server, especially for the data connection, and now you have pretty much the same problem that you did on the reading side: when you've written only part of the data you want to send, and the socket is not ready for you to send more, what do you do? You could just track where you are in the buffer, and send more pieces as you can until the buffer is empty. But then you are wasting opportunities to read more data from the source file, or to buffer more control responses, until you work through the buffer. Once again, a circular buffer can mitigate that, by giving you a place to buffer more data without requiring it to start at the beginning of the buffer or being limited by the available space before the physical end of the buffer.