cstringsuffix-arrayprogramming-pearls

Why does this example use null padding in string comparisons? “Programming Pearls”: Strings of Pearls


In "Programming Pearls": Strings of Pearls, section 15.3 (Generating Text), the author introduces how to generate random text from an input document. In the source code, there are some things that I don't understand.

for (i = 0; i < k; i++)
        word[nword][i] = 0;

The author explains: "After reading the input, we append k null characters(so the comparison function doesn't run off the end)." This explanation really confuses me, since it still works well after commenting these two lines. Why is this necessary?


Solution

  • Doing that reduces the number of weird cases you have to deal with when doing character-by-character comparisons.

     alphabet
     alpha___
    

    If you stepped through this one letter at a time, and the null padding at the end of alpha wasn't there, you'd try to examine the next element... and run right off the end of the array. The null padding basically ensures that when there's a character in one word, there's a corresponding character in the other. And since the null character has a value of 0, the shorter word always going to be considered as 'less than' the longer one!

    As to why it seems to work without those lines, there's two associated reasons I can think of:

    1. This was written in C. C does not guard its array boundaries; you can read whatever junk data is beyond the space that was allocated for it, and you'd never hear a thing.
    2. Your input document is made such that you never compare two strings where one is a prefix of the other (like alpha is to alphabet).