c linux io operating-system input-devices

Is stdin treated as a character device in Linux?

When I say stdin, I am referring to the stream referred to by fd = 0.

I am taking an OS course which covers block and character devices. It specifically said that the keyboard is a character device. However, when we were shown the read syscall, we were told that the kernel doesn't care what it is reading from as long as it is a block device or a file on a block device.

This is the code we were given:

#include <stdlib.h>
#include <unistd.h>

const int BUFFSIZE = 5;

int main () {
  int fd, n;
  char buffer[BUFFSIZE];

  int stdin = 0;
  int stdout = 1;
  int stderr = 2;

  do {
    n = read (0, buffer, BUFFSIZE);
    if (n < 0) {
      write (stderr, "Error occurred\n", 10);
    } else {
      write (stdout, "Entered if\n", 20);
      write (stdout, buffer, n);
    }
  } while (n > 0);
  return 0;
}

My question is: how does Linux treat standard input (fd = 0)? Is it treated as a character device, or does the kernel do some kind of buffering (this seems likely by judging by the results I got when running the code.)

Additionally, it would be useful to know if I can use the read syscall for reading from character devices in general. If so, is the input buffered?

Solution

The kernel generally does little or no buffering on character devices.

The kernel does a certain amount of buffering when reading from files in filesystems.

You can't say what kind of a device standard input is, because it varies from process to process. By default, fd 0 is usually the user's keyboard, which is a character device. But if I say

program < file

then fd 0 is an ordinary file. If I say

program < /dev/hda0

then fd 0 is a block device. And if I worked at it I could probably manage to get fd 0 hooked up to a network socket, too.

In Linux, there's also /proc/pid/fd/0, but that's not a device, either; it ends up looking like a symlink to the actual device in /dev, whatever it is.

Addendum: whether a particular device is buffered or not really depends on how the driver for that device is written. Any given driver may or may not implement some form of buffering. Furthermore, whether or not the buffering is actually used may end up depending on other factors. (For example, the Unix terminal drivers are all line-buffered by default, but that buffering is turned off if you put the driver into "cbreak" or "raw" modes). I don't think you can make any general statements saying that character or block devices are or aren't buffered.

Addendum 2: When you start peeling back the layers, it can get pretty complicated. Unix strives mightily (and generally does a very good job) in striking the right balance between do-what-I-mean versus keep-it-simple,stupid. For example, if you've got a terminal that's not line-buffered, and you ask for 10 characters, but there are only 3 available, read() will return 3. Which is the right thing, but it suggests that there's still a buffer somewhere, where those three characters accumulated between the time they were typed and the time you read them. Furthermore, if you asked for only 3, but there were 10 available, under some circumstances I think the other 7 would get saved for you, again suggesting a fair amount of kernel-level buffering.

But in raw mode, I'm pretty sure you can lose characters if you don't read them fast enough. Switching our attention from the terminal driver to network sockets, I had thought that under certain circumstances if you do a read() on a UDP-mode socket, and the actual UDP packet is bigger than your read request, you can lose the rest of the packet there, too. [Although a commenter suggests I may be wrong.] (TCP mode sockets, on the other hand, are obviously hugely buffered!)

So, bottom line: the rules can be complicated, and the precise details definitely depend on not only the particular device driver in use, but also potentially myriad other details.