javaequalityprimitive

How do Java's equality tests (`==`) work between same primitives (e.g. int, char) and what is most efficient?


How do Java's equality tests work between primitives? I'm specifically looking for efficiency (and speed) within single-type comparisons. Is the char == char comparison faster than int == int?

I know that behind the hood, chars are treated as numbers. I'm having difficulty finding any explanations on how == truly functions with primitives (it understandably doesn't have a JavaDoc/Oracle page, and I can't find its source code). The only data I found that can be tangentially related is that int uses 4 bytes, while char uses 2. (I don't need the extra size; all the data I'm comparing would be a real-world character, and Unicode can be represented in Java's 2-byte char, unlike C's 1-byte for ASCII.)

Is the speed the same, or is char faster due to its smaller size?

I understand that the base code for its operations may be derived from C or even assembly. If C (or equivalent) or higher, where would I be able to find it to further understand how it works?


Solution

  • Is the speed the same, or is char faster due to its smaller size?

    No. There are many reasons and they all say no.

    The 'minor' primitives say: No.

    How do primitives work

    You have the 4 major primitives and the 4 minors. The majors are float, double, int, and long. These have bytecode for all relevant operations. For example, you have FADD, DADD, IADD, and LADD - for, respectively, 'add two floats', 'add two doubles', 'add two ints', and 'add two longs'.

    The other 4 are the minors: char, boolean, short, and byte. These do not have any bytecode. Or, nearly no bytecode - there's bytecode to read/write to a byte/char/boolean/short array, but there is no BADD for 'add 2 bytes together'.

    The minors are represented at the class file level and in the JVM with an int. Thus, trivially then,

    is char faster due to its smaller size

    No. Because it isn't smaller. It's 32 bits just the same. If anything, it is slower, because java has to emulate overflow for these things.

    CPU design says: No.

    CPUs have a size they are 'comfy' with. It's not that they cannot operate on other sizes. However, they are fastest at that size, or, at least, operations on smaller things are no faster than on the 'comfy' size. This is generally called a word. Also, often memory has to be aligned; any operation on memory is simply not allowed and would cause a CPU fault condition unless that operation is done on a position that is evenly divisible by the word size.

    On 64-bit architecture, word size is 64-bits. That's... kinda the point: That's why 64-bit architecture is called that.

    All of java's primitives are 64-bit or smaller, so, trivially, no, things aren't faster just because they are smaller.

    In fact, due to alignment rules, generally everything is 64-bit anyway. Even a boolean. Arrays are an exception (a new byte[100] takes about 100 bytes of memory, not 800; but 100 separate fields of type byte may well take 800, it is up to a JVM implementation).

    So how does it work?

    Well, something like someChar == someChar results in bytecode you can investigate with the javap tool. And how does that end up running on a CPU? Who knows! That's the point of java. It's a spec, not a single implementation. How that ends up actually working depends on a combinatory explosion of factors - JVM version, JVM vendor (though, practically, that is irrelevant; all vendors operate on pretty much the same codebase these days), architecture (as in, what CPU you have - things are different on ARM chips than intel chips for example, even x86 vended by intel vs. AMD can make a difference!), OS, bit mode, flags like compressedOops, whether your code has been hotspotted, and more.

    But, in practice? The CPU has an opcode for equality and that's likely what ends up being used. How does that work? Books on how CPUs work including treatises on microcode and such are available and hundreds of pages long requiring PhD level university degrees to understand. At some point asking 'but.. why?' ends up at 'well a really long time ago there was this big bang...'.

    Some details from the spec

    Note that java, like most languages, fundamentally does not support heterogenous operations. You simply cannot add a float and a double together, period. At least, at the class level. There's FADD (add 2 floats) and DADD (add 2 doubles). There is no FDADD or DFADD. So, when you write:

    float x = 1.5;
    double y = 2.5;
    System.out.println(x + y);
    

    That's not possible. However, javac makes that possible by silently adding a conversion - the bytecode generated here will first convert x to a double, and then adds the two doubles, and that's then passed to the println method. So why does javac get to just silently inject that conversion (which you can manually force with casts - System.out.println((double) x + y) is allowed, and would generate exactly the same bytecode?

    Because the java spec says that if an operation is not possible, but applying a 'widening conversion' to one of the two parts of an operation would make it possible, javac will inject that conversion silently. widening conversions are:

    Note that casting is implied and automatic, even for narrowing conversion, when using the compound assignment operators. This is legal java:

    float x = 5.5;
    int y = 10;
    y += x;
    

    But this is a compiler error:

    float x = 5.5;
    int y = 10;
    y = y + x;
    

    Because the second one first converts y to a float, then adds 2 floats, then tries to assign this float to an int param which is a narrowing conversion and therefore isn't allowed and requires an explicit cast.