chapel

Is there a way to string.format a runtime-determined number of items in one call in Chapel?


Is there a way to use one string.format() call to format a runtime-determined number of items?

I wrote a program that generates a million md5sums and converts them to string with

  md5.getDigest(input).toHexString();

It took 3 minutes on my laptop, but only 50 seconds if I modify the Crypto package's toHexString() as follows to use fewer seperate .format() calls to generate the string (I'm assuming due to the string append for each one). I take this to mean formatting the string takes at least twice as long as generating the md5sum in the first place.

@ -239,9 +239,17 @@ module Crypto {
     */
     proc toHexString() throws {
       var buffHexString: string;
-      for i in this.buffDomain do {
+      var next = this.buffDomain.first;
+      for i in this.buffDomain by 8 do
+       if this.buffDomain.contains(i+7) {
+         buffHexString += try ("%02xu"*8).format(
+               this.buff[i+0], this.buff[i+1], this.buff[i+2], this.buff[i+3],
+               this.buff[i+4], this.buff[i+5], this.buff[i+6], this.buff[i+7]);
+         next = i+8;
+       }
+      for i in this.buffDomain[next..] do
         buffHexString += try "%02xu".format(this.buff[i]);
-      }
+
       return buffHexString;
     }
   }

but that's gross. I'd like to do something like

bufHexString = try ("%02xu" * this.buff.size).format(this.buff .... something);

but string.format() only accepts its var args by args ...?k which needs a compile-time param number of args.

The question I'm asking is about getting string.format to work like this, but I'd also be happy with another way to generate a string like this all at once without any intermediate temporary strings. (I don't see a way to do it in Chapel code via the string.createBorrowingBuffer() without dropping into c_ptrs.)


Solution

  • I think this post points out something missing from the string and bytes types. What is missing is the ability to append a codepoint (as an int(32)) or a byte (as a uint(8)). I will look at adding these to Chapel's standard library.

    But, I will answer your question more directly. It turns out that string.format actually operates through the IO system. That is how the format strings match with writef -- it is actually using the same implementation. string.format does this with openMemFile. But you can do the same thing, e.g.:

    use IO;
    var f = openMemFile();
    {
      var w = f.writer(locking=false);
      for byte in A {
        w.writef("%02xu", byte);
      }
    }
    var r = f.reader(locking=false);
    return r.readAll(string);
    

    In some quick performance experiments in this area, I observed on my system (with running toHex on a 16-element array of uint(8)):

    But, even better performance is available with the ability to append a numeric byte value / codpoint value: