cerlangstdoutstdinerlang-ports

C and Erlang: Erlang Port example


Disclaimer: The author of the question has an average knowledge of Erlang and a basic knowledge of C.

I am reading the Interoperability Tutorial User Guide now. I have successfully compiled the complex.c example and it works with the Erlang Port without any problems.

However, I would like to understand how the actual C code works. I understand it in general: in the example it reads 2 bytes from the standard input and checks the first byte. Depending on the first byte it calls either foo or bar function. This is the limit of my understanding of it right now.

So, if we take both erl_comm.c:

/* erl_comm.c */

typedef unsigned char byte;

read_cmd(byte *buf)
{
  int len;

  if (read_exact(buf, 2) != 2)
    return(-1);
  len = (buf[0] << 8) | buf[1];
  return read_exact(buf, len);
}

write_cmd(byte *buf, int len)
{
  byte li;

  li = (len >> 8) & 0xff;
  write_exact(&li, 1);

  li = len & 0xff;
  write_exact(&li, 1);

  return write_exact(buf, len);
}

read_exact(byte *buf, int len)
{
  int i, got=0;

  do {
    if ((i = read(0, buf+got, len-got)) <= 0)
      return(i);
    got += i;
  } while (got<len);

  return(len);
}

write_exact(byte *buf, int len)
{
  int i, wrote = 0;

  do {
    if ((i = write(1, buf+wrote, len-wrote)) <= 0)
      return (i);
    wrote += i;
  } while (wrote<len);

  return (len);
}

and port.c:

/* port.c */

typedef unsigned char byte;

int main() {
  int fn, arg, res;
  byte buf[100];

  while (read_cmd(buf) > 0) {
    fn = buf[0];
    arg = buf[1];

    if (fn == 1) {
      res = foo(arg);
    } else if (fn == 2) {
      res = bar(arg);
    }

    buf[0] = res;
    write_cmd(buf, 1);
  }
}

What does each function actually do there? What purpose do li, len, i, wrote, got variables actually serve?

Some more small questions:

  1. Why do not the functions have any return types, even voids?
  2. When Erlang port sends data to C, the first byte determines a function to be called. If the byte holds the decimal 1, then foo() is called, if the byte holds the decimal 2, then bar() is called. If not changed anyhow this protocol can be used to call up to 255 different C functions with only 1 parameter each. Is that right?
  3. "Adding the length indicator will be done automatically by the Erlang port, but must be done explicitly in the external C program". What does that mean? On which line of code is it done?
  4. From the Tutorial: "By default, the C program should read from standard input (file descriptor 0) and write to standard output (file descriptor 1)." Then: "Note that stdin and stdout are for buffered input/output and should not be used for the communication with Erlang!" What is the catch here?
  5. why buf is initialized to [100]?

Solution

  • This answer is likewise disclaimed (I'm not an Erlang or C programmer, I just happen to be going through the same material)

    Your initial model is a bit off. The way the code actually works is by reading the first two bytes from stdin, assuming that it signifies the length of the actual message, then reading that many more bytes from stdin. In this specific case, it happens that the actual message is always two bytes (a number corresponding to a function and a single integer argument to pass to it).

    0 - a) read_exact reads len bytes from stdin, read_cmd uses read_exact first to determine how many bytes it should read (either a number signified by the first two bytes, or none if there are fewer than two bytes available), and then to read that many bytes. write_exact writes len bytes to stdout, write_cmd uses write_exact to output a two byte length header, followed by a message (hopefully) of the appropriate length.

    0 - b) I think len is sufficiently covered above. li is the name of the variable used to generate that two-byte header for the write function (I can't take you through the bit shift operations step by step, but the end result is that len is represented in the first two bytes sent). i is an intermediate variable whose main purpose seems to be making sure that write and read don't return an error (if they do, that error code is returned as the result of read_exact/write_exact). wrote and got keep track of how many bytes have been written/read, the containing loops exit before it becomes greater than len.

    1 - I'm actually not sure. The versions I was working with are of type int, but otherwise identical. I got mine out of chapter 12 of Programming Erlang rather than the guide you link.

    2 - That's correct, but the point of the port protocol is that you can change it to send different arguments (if you're sending arbitrary arguments, it would probably be a better idea to just use the C Node method rather than ports). As an example, I modified it subtly in a recent piece so that it sends a single string, since I only have one function I want to call on the C side, eliminating the need for specifying a function. I should also mention that if you have a system which needs to call more than 255 different operations written in C, you may want to rethink its' structure (or just go the whole nine and write it all in C).

    3 - This is done

    read_cmd(byte *buf)
    {
      int len;
    
      if (read_exact(buf, 2) != 2)   // HERE
        return(-1);                  // HERE
      len = (buf[0] << 8) | buf[1];  // HERE
      return read_exact(buf, len);
    }
    

    in the read_cmd function and

    write_cmd(byte *buf, int len)
    {
      byte li;
    
      li = (len >> 8) & 0xff;        // HERE
      write_exact(&li, 1);           // HERE
    
      li = len & 0xff;               // HERE
      write_exact(&li, 1);           // HERE
    
      return write_exact(buf, len);
    }
    

    in the write_cmd function. I think the explanation is covered in 0 - a); that's a header that tells/finds out how long the rest of the message will be (yes, this means that it can only be a finite length, and that length must be expressible in two bytes).

    4 - I'm not entirely sure why that would be a catch here. Care to elaborate?

    5 - buf is a byte array, and has to be explicitly bounded (for memory management purposes, I think). I read "100" here as "a number larger than the maximum message size we plan to accommodate". The actual number picked seems arbitrary, it seems like anything 4 or higher would do, but I could stand to be corrected on that point.