ctokenizestrtokstrsep

strtok_r save state behaviour


The correct way to use strtok_r is as follows:

char* str = strdup(string);
char* save;
char* ptr = strtok_r(str, delim, &save);
while(ptr) {
  puts(ptr);
  ptr = strtok_r(NULL, delim, &save);
}

When trying to inspect what actually is stored in save, I found it is just the rest of the unparsed string. So I tried to make the second call look like the first and wrote a wrapper as follows.

char* as_tokens(char** str, const char* const delim) {
  return strtok_r(NULL, delim, str);
}

This can be used like below which is much less verbose. We don't have to differentiate between first call and rest.

char* str = strdup(string);
char* ptr;
while(ptr = as_tokens(&str, delim))
  puts(ptr);

Are there any downsides in this approach? Am I causing any undefined behavior? I tried some edge cases and both approaches work similarly.

Online Compiler: https://wandbox.org/permlink/rkGiwXOUtzqrbMpP

P.S. Ignoring memory leaks for brevity.


Update

There already exists a function almost similar to my as_tokens: strsep. It differs in the case when there are consecutive delimiters. strsep returns an empty string while as_tokens (i.e strtok_r) treats them as one.


Solution

  • Are there any downsides in this approach?

    Yes, it loses the original value of str, making it impossible (in this case) to free it. You therefore have a memory leak. That could be solved by keeping a separate copy of the pointer, but that boils down to very nearly the same thing as your first code.

    Additionally, as was observed in comments, it does not comply with the specifications of strtok_r in that the behavior of a call to strtok_r with the first argument NULL is defined only in the context of a previous call to strtok_r that provided the value to which the third argument points.

    Also, it departs from idiomatic, well-understood use of strtok_r, even going so far as to hide it in a different function. The normal idiom is not onerous, and it is well known and understood. Being clever about it makes your code a bit harder to maintain.

    Am I causing any undefined behavior?

    Yes, in the sense of "behavior that is not defined", as opposed to behavior that is explicitly called out as undefined. But the relevant standards attribute the same significance to those alternatives. See above.