I'm writing a program where I need to parse some configuration files in addition to user input from a graphical user interface. In particular, I'm having issues with parsing strings taken from the configuration file into floats as the function I've been using for this purpose so far, strtof()
, respects the current locale which means a string that represents a floating point number may parse into 0.10000000149011612
in one locale and 0
in another—not good. This is because some locales use the full stop (.
) for denoting the decimal separator whereas others use a comma (,
), but the strings from the configuration file always use a full stop.
These configuration files are distributed to users in identical format regardless of their locale, and it is not feasible to distribute different versions dependent on the locale they have set—especially as they are a global immutable resource part of the operating system base and a system may have multiple users that aren't necessarily using the same locale.
I can't just set the locale to something predictable at program startup because removing support for i18n is a non-starter. I also want to preserve locale-specific parsing for user input as referenced earlier. I also don't think I safely can call setlocale(LC_ALL, "C")
when I start parsing and then finish with setlocale(LC_ALL, "whatever it was before")
as this is a multi-threaded program and I can't guarantee that other threads aren't doing locale-dependent work while configuration file parsing is happening.
So, how can I parse strings into floats in a locale-independent fashion in C, preferably without relying on functionality outside of the standard library? The program I'm writing only targets Linux (although it may also be possible to run it on BSDs, but they are not a priority), so Linux-specific answers are just fine.
It is indeed unfortunate that the C Standard does not provide functions to handle these conversions for a specified locale.
There is no simple portable solution to this problem using standard functions. Converting the strings from the config file to the locale specific alternative is feasible but tricky.
There is a simple work around for the config file. Use the exponent notation without decimals: 123e-3
is portable locale neutral version of 0.123
or 0,123
.
POSIX has alternate functions for most standard functions with locale specific behavior, but unfortunately not for strtod()
and friends.
Yet both the GNU libc on linux (and alternate libraries such as musl) and the BSD systems support extended POSIX locale functions:
#define _GNU_SOURCE // for linux
#include <stdlib.h>
#ifdef __APPLE__
#include <xlocale.h> // on macOS
#endif
double strtod_l(const char * restrict nptr, char ** restrict endptr,
locale_t loc);
float strtof_l(const char * restrict nptr, char ** restrict endptr,
locale_t loc);
long double strtold_l(const char * restrict nptr, char ** restrict endptr,
locale_t loc);
On macos, it seems you can pass 0
for the loc
argument and get the C locale, on linux loc
is specified in the header file as non null so you need to create a C
locale with newlocale
.
Here is an example:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <locale.h>
#ifdef __APPLE__
#include <xlocale.h>
#endif
locale_t c_locale;
int main(void) {
const char locale_name[] = "fr_FR.UTF-8";
const char locale_string[] = "0,123";
const char standard_string[] = "0.123";
c_locale = newlocale(LC_ALL_MASK, "C", (locale_t)0);
setlocale(LC_ALL, locale_name);
double x1, x2, y1, y2;
x1 = strtod(locale_string, NULL);
x2 = strtod_l(standard_string, NULL, c_locale);
int s1 = sscanf(locale_string, "%lf", &y1);
int s2 = sscanf_l(standard_string, c_locale, "%lf", &y2);
printf("default locale: %s\n\n", locale_name);
printf("using printf(...):\n");
printf(" strtod(\"%s\", NULL) -> %f\n", locale_string, x1);
printf(" strtod_l(\"%s\", NULL, c_locale) -> %f\n", standard_string, x2);
printf(" sscanf(\"%s\", &y1) -> %d, y1=%f\n", locale_string, s1, y1);
printf(" sscanf_l(\"%s\", c_locale, &y2) -> %d, y2=%f\n", standard_string, s2, y2);
printf("\nusing printf_l(c_locale, ...):\n");
printf_l(c_locale, " strtod(\"%s\", NULL) -> %f\n", locale_string, x1);
printf_l(c_locale, " strtod_l(\"%s\", NULL, c_locale) -> %f\n", standard_string, x2);
printf_l(c_locale, " sscanf(\"%s\", &y1) -> %d, y1=%f\n", locale_string, s1, y1);
printf_l(c_locale, " sscanf_l(\"%s\", c_locale, &y2) -> %d, y2=%f\n", standard_string, s2, y2);
return 0;
}
Output:
default locale: fr_FR.UTF-8
using printf(...):
strtod("0,123", NULL) -> 0,123000
strtod_l("0.123", NULL, c_locale) -> 0,123000
sscanf("0,123", &y1) -> 1, y1=0,123000
sscanf_l("0.123", c_locale, &y2) -> 1, y2=0,123000
using printf_l(c_locale, ...):
strtod("0,123", NULL) -> 0.123000
strtod_l("0.123", NULL, c_locale) -> 0.123000
sscanf("0,123", &y1) -> 1, y1=0.123000
sscanf_l("0.123", c_locale, &y2) -> 1, y2=0.123000