javajava-collections-api

Why jdk code style uses a variable assignment and read on the same line - eg. (i=2) < max


I noticed that in the jdk source code and more specifically in the collections framework there is a preference in assigning variables right before reading them in expressions. Is it just a simple preference or is it something more important which I am not aware of? One reason that I can think of is that the variable is used in this expression only.

As I am not used to this style I find it hard to read it. The code is very condensed. Below you can see an example taken from java.util.HashMap.getNode()

Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 && ...) {
   ...
}

Solution

  • As already mentioned in the comment: Doug Lea, who is one of the main authors of the collections framework and the concurrency packages, tends to do optimizations that may look confusing (or even counterintuitive) for mere mortals.

    A "famous" example here is copying fields to local variables in order to minimize the size of the bytecode, which is actually also done with the table field and the local tab variable in the example that you referred to!


    For very simple tests, it does not seem to make a difference (referring to the resulting bytecode size) whether the accesses are "inlined" or not. So I tried to create an example that roughly resembles the structures of the getNode method that you mentioned: An access to a field that is an array, a length check, an access to a field of one array element...

    The code:

    class Node
    {
        int k;
        int j;
    }
    
    public class AssignAndUseTestComplex
    {
        public static void main(String[] args)
        {
            AssignAndUseTestComplex t = new AssignAndUseTestComplex();
            t.testSeparate(1);
            t.testInlined(1);
            t.testRepeated(1);
        }
    
        private Node table[] = new Node[] { new Node() };
    
        int testSeparate(int value)
        {
            Node[] tab = table;
            if (tab != null)
            {
                int n = tab.length;
                if (n > 0)
                {
                    Node first = tab[(n-1)];
                    if (first != null)
                    {
                        return first.k+first.j;
                    }
                }
            } 
            return 0;
        }
    
        int testInlined(int value)
        {
            Node[] tab; Node first, e; int n;
            if ((tab = table) != null && (n = tab.length) > 0 && 
                (first = tab[(n - 1)]) != null) {
                return first.k+first.j;
            }
            return 0;
        }
    
        int testRepeated(int value)
        {
            if (table != null)
            {
                if (table.length > 0)
                {
                    if (table[(table.length-1)] != null)
                    {
                        return table[(table.length-1)].k+table[(table.length-1)].j;
                    }
                }
            } 
            return 0;
        }
    
    }
    

    And the resulting bytecodes: The testSeparate method uses 41 instructions:

      int testSeparate(int);
        Code:
           0: aload_0
           1: getfield      #15                 // Field table:[Lstackoverflow/Node;
           4: astore_2
           5: aload_2
           6: ifnull        40
           9: aload_2
          10: arraylength
          11: istore_3
          12: iload_3
          13: ifle          40
          16: aload_2
          17: iload_3
          18: iconst_1
          19: isub
          20: aaload
          21: astore        4
          23: aload         4
          25: ifnull        40
          28: aload         4
          30: getfield      #37                 // Field stackoverflow/Node.k:I
          33: aload         4
          35: getfield      #41                 // Field stackoverflow/Node.j:I
          38: iadd
          39: ireturn
          40: iconst_0
          41: ireturn
    

    The testInlined method indeed is a tad smaller, with 39 instructions

      int testInlined(int);
        Code:
           0: aload_0
           1: getfield      #15                 // Field table:[Lstackoverflow/Node;
           4: dup
           5: astore_2
           6: ifnull        38
           9: aload_2
          10: arraylength
          11: dup
          12: istore        5
          14: ifle          38
          17: aload_2
          18: iload         5
          20: iconst_1
          21: isub
          22: aaload
          23: dup
          24: astore_3
          25: ifnull        38
          28: aload_3
          29: getfield      #37                 // Field stackoverflow/Node.k:I
          32: aload_3
          33: getfield      #41                 // Field stackoverflow/Node.j:I
          36: iadd
          37: ireturn
          38: iconst_0
          39: ireturn
    

    Finally, the testRepeated method uses a whopping 63 instructions

      int testRepeated(int);
        Code:
           0: aload_0
           1: getfield      #15                 // Field table:[Lstackoverflow/Node;
           4: ifnull        62
           7: aload_0
           8: getfield      #15                 // Field table:[Lstackoverflow/Node;
          11: arraylength
          12: ifle          62
          15: aload_0
          16: getfield      #15                 // Field table:[Lstackoverflow/Node;
          19: aload_0
          20: getfield      #15                 // Field table:[Lstackoverflow/Node;
          23: arraylength
          24: iconst_1
          25: isub
          26: aaload
          27: ifnull        62
          30: aload_0
          31: getfield      #15                 // Field table:[Lstackoverflow/Node;
          34: aload_0
          35: getfield      #15                 // Field table:[Lstackoverflow/Node;
          38: arraylength
          39: iconst_1
          40: isub
          41: aaload
          42: getfield      #37                 // Field stackoverflow/Node.k:I
          45: aload_0
          46: getfield      #15                 // Field table:[Lstackoverflow/Node;
          49: aload_0
          50: getfield      #15                 // Field table:[Lstackoverflow/Node;
          53: arraylength
          54: iconst_1
          55: isub
          56: aaload
          57: getfield      #41                 // Field stackoverflow/Node.j:I
          60: iadd
          61: ireturn
          62: iconst_0
          63: ireturn
    

    So it seems that this "obscure" way of writing the queries and assignments can, indeed, save a few bytes of bytecode, and (given the justification in the linked answer about storing fields in local variables) this may have been the reason to use this style.

    But...

    in any case: After the method has been executed a few times, the JIT will kick in, and the resulting machine code will have "nothing" to do with the original bytecode - and I'm pretty sure that all three versions will actually be compiled to the same machine code in the end.

    So the bottom line is: Don't use this style. Instead, just write dumb code that is easy to read and maintain. You'll know when it's your turn to use "optimizations" like these.


    EDIT: A short addendum...

    I have made a further test, and compared the testSeparate and testInlined method concerning the actual machine code that is generated by the JIT.

    I modified the main method a bit, to prevent unrealistic over-optimizations or other shortcuts that the JIT might take, but the actual methods have been left unmodified.

    And as expected: When calling the methods a few thousand times with a hotspot disassembly JVM and -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:+PrintAssembly, then the actual machine code of both methods is identical.

    So once more, the JIT does its job pretty well, and the programmer can focus on writing readable code (whatever that means).

    ... and and a minor correction/clarification:

    I did not test the third method, testRepeated, because it is not equivalent to the other methods (and thus, it can not result in the same machine code). This is, by the way, another small advantage of the strategy of storing fields in local variables: It offers a (very limited, but sometimes handy) form of "thread safety": It is made sure that the length of an array (like the tab array in the getNode method of HashMap) can not change while the method is being executed.