javaequals

Why does Java StringLatin1.regionMatchesCI method perform toUpperCase() and then toLowerCase() when comparing chars?


I was looking into String.euqalsIgnoreCase method and found that at the end it invokes StringLatin1.regionMatchesCI method.

However, the code of this method seems strange to me, here it is:

public static boolean regionMatchesCI(byte[] value, int toffset,
                                      byte[] other, int ooffset, int len) {
    int last = toffset + len;
    while (toffset < last) {
        char c1 = (char)(value[toffset++] & 0xff);
        char c2 = (char)(other[ooffset++] & 0xff);
        if (c1 == c2) {
            continue;
        }
        char u1 = Character.toUpperCase(c1);
        char u2 = Character.toUpperCase(c2);
        if (u1 == u2) {
            continue;
        }
        if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
            continue;
        }
        return false;
    }
    return true;
}

Why check the upperCase and then lowerCase? Wouldn't the lower cases always fail in case the upper check doesn't match? Am I missing something?


Solution

  • In the source code I found (somewhere on google) for this function I have additional explanation:

            // try converting both characters to uppercase.
            // If the results match, then the comparison scan should
            // continue.
            char u1 = Character.toUpperCase(c1);
            char u2 = Character.toUpperCase(c2);
            if (u1 == u2) {
                continue;
            }
            // Unfortunately, conversion to uppercase does not work properly
            // for the Georgian alphabet, which has strange rules about case
            // conversion.  So we need to make one last check before
            // exiting.
            if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
                continue;
            }
    

    So it looks like some workarounds. On github you might find even more different implementations of this function.