cparsingscanfatoi

Parse human-readable sizes (k, M, G, T) into bytes in C


I'm looking for a quick way to parse human-readable byte sizes (examples: 100, 1k, 2M, 4G) into a byte values. The input is a char * and the output must be a size_t (e.g. unsigned, likely 64-bit or 32-bit integer, depending on architecture). The code should detect invalid input and return an value indicating that it was invalid input.

Examples:

Input  => size_t result
-----------------------
"100"  => 100
"10k"  => 10240
"2M"   => 2097152
"4G"   => 4294967296 on 64-bit machine, error (overflow) on 32-bit machine
"ten"  => error

Here is an example fragment of code to be expanded to handle the unit prefixes:

int parse_human_readable_byte_size(char *input, size_t *result) {
    /* TODO: needs to support k, M, G, etc... */
    return sscanf("%zu", result) == 1;
}

Here are some additional requirements:

The code is expected to run only a few times per program execution, so smaller readable code is favored over longer higher-performance code.


Solution

  • Here is a potential implementation. Code to detect all errors is included; fill in your own handling in place of the gotos if you like.

    char *endp = s;
    int sh;
    errno = 0;
    uintmax_t x = strtoumax(s, &endp, 10);
    if (errno || endp == s) goto error;
    switch(*endp) {
    case 'k': sh=10; break;
    case 'M': sh=20; break;
    case 'G': sh=30; break;
    case 0: sh=0; break;
    default: goto error;
    }
    if (x > SIZE_MAX>>sh) goto error;
    x <<= sh;