cif-statementrelational-operators

Why can we compare character constants?


Came across this statement in a C book, if ((letter >= 'P') && (letter <= 'S'))

It's trying to check if the letter falls between P and S (including both) and I was quite surprised to see it work.

  1. How is it possible to do greater-than/lesser-than type operations on alphabets (character data type)?
  2. It is a feature in only C or in other programming languages as well?

Solution

  • How is it possible

    To invert the question, why wouldn’t it be? Our alphabet is naturally ordered, and having an ordering between letters is convenient and just makes sense. In the particular case of C, character literal are simply integer numbers, so ordering them naturally works.

    On a more technical levels, characters are mapped to integer numbers by some encoding schema, known as a character encoding. Different encodings exist, and which particular encoding is used isn’t defined by the C standard, and isn’t important for comparing characters, as long as it’s consistent. This also means that C does not guarantee that this ordering corresponds to a particular alphabet, or that letters are consecutive.

    It is a feature in only C or in other programming languages as well?

    Virtually every modern programming language allows order comparison of character types.


    Caveat: Many languages extend this capability to character strings. Such a comparison is called lexicographical comparison. While C also supports this, e.g. via strcmp, it’s important to note that C does not allow you to compare strings using operators (<, >, <=, >=). Unfortunate C will accept your attempt to do so without complaining:

    char a[] = "hello";
    char b[] = "world";
    if (a < b) { /* not allowed! */ }
    if (strcmp(a, b) < 0) { /* correct way. */ }
    

    The first if will compile, but it will do the wrong thing: instead of comparing the string contents, it will convert a and b to pointers and compare the values of the two pointers. And that is undefined behaviour (!), which means that it’s not valid C, the result is potentially unpredictable, but the compiler will not necessarily tell you that anything is wrong.