cscanf

sscanf() with %s matches too greedily


I am trying to parse a string with the format 1..4, I don't want to parse these as strings, they could be anything.

However using sscanf:

char *str = "1..4";
char a[128];
char b[128];
int c = sscanf(str, "%s..%s", a, b);
printf("a=%s, b=%s, c=%d\n", a, b, c);

Gives the following output:

a=1..4, b=��, c=1

Only one string is parsed? Am I doing something wrong or is this a bug in sscanf?

If I remove the dots:

char *str = "1 4";
char a[128];
char b[128];
int c = sscanf(str, "%s %s", a, b);
printf("a=%s, b=%s, c=%d\n", a, b, c);

I get:

a=1, b=4, c=2

Which is what I'd expect in the first example.

From the documentation . is not specified as a format character.

Changing the format string to parse as integers instead works as well:

char *str = "1..4";
int a;
int b;
int c = sscanf(str, "%d..%d", &a, &b);
printf("a=%d, b=%d, c=%d\n", a, b, c);

Gives:

a=1, b=4, c=2

Solution

  • The manual for %s when used in sscanf() is:

    Matches a sequence of non-white-space characters; the next pointer must be a pointer to character array that is long enough to hold the input sequence and the terminating null byte ('\0'), which is added automatically. The input string stops at white space or at the maximum field width, whichever occurs first.

    Without providing more specific information to sscanf() about the nature of the strings you are trying to extract, what you are trying to do cannot simply work with sscanf("%s").

    My suggestion is since you don't want your first scanned argument to contain any periods, then use the sscanf("%[]") conversion to specify what is not allowed:

    Matches a nonempty sequence of characters from the specified set of accepted characters; the next pointer must be a pointer to char, and there must be enough room for all the characters in the string, plus a terminating null byte. The usual skip of leading white space is suppressed. The string is to be made up of characters in (or not in) a particular set; the set is defined by the characters between the open bracket [ character and a close bracket ] character. The set excludes those characters if the first character after the open bracket is a circumflex (^). To include a close bracket in the set, make it the first character after the open bracket or the circumflex; any other position will end the set. The hyphen character - is also special; when placed between two other characters, it adds all intervening characters to the set. To include a hyphen, make it the last character before the final close bracket. For instance, [^]0-9-] means the set "everything except close bracket, zero through nine, and hyphen". The string ends with the appearance of a character not in the (or, with a circumflex, in) set or when the field width runs out.

    If you know you are willing to accept any string that does not contain a period then try something like the following that excludes periods:

    char *str = "1..4";
    char a[128];
    char b[128];
    int c = sscanf(str, "%[^.]..%s", a, b);
    printf("a=%s, b=%s, c=%d\n", a, b, c);