clanguage-lawyerstrtok

Is `strtok(NULL, "")` portable according to the C standard?


char buffer[] = "head:rest(with possibly more : delimiters)";
char* head = strtok(buffer, ":"); // head -> "head"
char* rest = strtok(NULL, "");    // rest -> "rest(with possibly more : delimiters)"

This works as expected on glibc: the first token is "head", and the second call to strtok with an empty string as delimiter ("") gives me the rest of the original string.

But is this usage portable? Specifically: is passing an empty string as the second argument to strtok well-defined according to the ISO C standard?

I've checked a few man pages and references, but haven't found a clear statement for or against this pattern.


Solution

  • is passing an empty string as the second argument to strtok well-defined according to the ISO C standard?

    The spec does not speak explicitly to this case, for or against. However, we don't necessarily expect the spec to affirmatively allow specific arguments. I don't think anyone would doubt that an empty string is a valid argument to strcpy(), for example, even though the spec does not specifically allow it. The spec's description of the function's behavior is compatible with empty delimiter strings (first or later calls), so I see no reason to think it is consistent with implementations rejecting such arguments.

    For the first call, the spec says:

    The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

    The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1

    (C23 7.26.5.9/3-4)

    For an empty delimiter string, that says that the first token in the subject string starts at its first character, if any, (as whatever this is, it is not in the delimiter string) and encompasses the whole string (because none of the other characters can be in the delimiter string either).

    For the second call, the spec says:

    Each subsequent call, with a null pointer as the value of the first argument, starts searching from the saved pointer and behaves as described previously.

    (C23 7.26.5.9/5)

    This is no more problematic than the first-call case. If any characters remain to be tokenized then the token encompasses all of them.