c++boostlocaleboost-locale

stod does not work correctly with boost::locale


I am trying to use boost::locale and std::stod together in a german locale where a comma is the decimal separator. Consider this code:

boost::locale::generator gen;

std::locale loc("");  // (1)
//std::locale  loc = gen("");  // (2)

std::locale::global(loc);
std::cout.imbue(loc);

std::string s = "1,1";  //float string in german locale!
double d1 = std::stod(s);
std::cout << "d1: " << d1 << std::endl;

double d2 = 2.2;
std::cout << "d2: " << d2 << std::endl;

std::locale loc("") creates the correct locale and the output is

d1: 1,1
d2: 2,2

as I expect. When I comment out line (1) and uncomment line (2), the output is

d1: 1
d2: 2.2

The result for d2 is to be expected. As far as I understand boost::locale wants me to explicitly specify that d2 should be formated as a number and doing

std::cout << "d2: " << boost::locale::as::number << d2 << std::endl;

fixes the output to 2,2 again. The problem is that std::stod does not consider 1,1 as a valid floating point number anymore and truncates it to 1.

My question is: why does std::stod stops working when I generate my locale with boost::locale ?

Additional information: I am using VC++2015, Boost 1.60, no ICU, Windows 10

Update:

I noticed that the problem is fixed when I set the global locale twice, first with std::locale("") and then with boost:

std::locale::global(std::locale(""));
bl::generator gen;
std::locale::global(gen(""));

I have no idea why it behaves this way, though!


Solution

  • Long story short: boost::locale changes only the global c++-locale object, but not the C-locale. stod uses the C-locale and not the global c++-locale object. std::localechanges both: the global c++-locale object and the C locale.


    The whole story: std::locale is a subtle thing and responsible for a lot of debugging!

    Let's start with the c++ class std::locale:

      std::locale loc("de_DE.utf8");  
      std::cout<<loc.name()<<"\n\n\n";
    

    creates the German locale (if it is available on your machine, otherwise it throws), which results in de_DE.utf8 on the console.

    However it does not change the global c++ locale object, which is created at the start-up of your program and is classical ("C"). The constructor of std::locale without arguments returns a copy of the global state:

    ...
      std::locale loc2;
      std::cout<<loc2.name()<<"\n\n\n";
    

    Now you should see C if nothing messed up your locale before. std::locale("") would do some magic and find out the preferences of the user and return it as object, without changing the global state.

    You can change the local state with std::local::global:

      std::locale::global(loc);
      std::locale loc3;
      std::cout<<loc3.name()<<"\n\n\n";
    

    The default constructor results this time in de_DE.utf8 on the console. We can restore the global state to the classical by calling:

      std::locale::global(std::locale::classic());
      std::locale loc4;
      std::cout<<loc4.name()<<"\n\n\n";
    

    which should give you C again.

    Now, when the std::cout is created it clones its locale from the global c++ state (here we do it with the stringstreams, but it the same). Later changes of the global state does not affect the stream:

     //classical formating
      std::stringstream c_stream;
    
     //german formating:
      std::locale::global(std::locale("de_DE.utf8"));
      std::stringstream de_stream;
    
      //same global locale, different results:
      c_stream<<1.1;
      de_stream<<1.1;
    
      std::cout<<c_stream.str()<<" vs. "<<de_stream.str()<<"\n";
    

    Gives you 1.1 vs. 1,1 - the first is the classical the second german

    You can change the local locale-object of a stream with imbue(std::locale::classic()) it goes without saying, that this doesn't change the global state:

      de_stream.imbue(std::locale::classic());
      de_stream<<" vs. "<<1.1;
      std::cout<<de_stream.str()<<"\n";
      std::cout<<"global c++ state: "<<std::locale().name()<<"\n";
    

    and you see:

    1,1 vs. 1.1
    global c++ state: de_DE.utf8
    

    Now we are coming to std::stod. As you can imagine it uses the global c++ locale (not entirely true, bear with me) state and not the (private) state of the cout-stream:

    std::cout<<std::stod("1.1")<<" vs. "<<std::stod("1,1")<<"\n";
    

    gives you 1 vs. 1.1 because the global state is still "de_DE.utf8", so the first parsing stops at '.' but the local state of std::cout is still "C". After restoring the global state we get the classical behaviour:

      std::locale::global(std::locale::classic());
      std::cout<<std::stod("1.1")<<" vs. "<<std::stod("1,1")<<"\n";
    

    Now the German "1,1" is not parsed properly: 1.1 vs. 1

    Now you might think we are done, but there is more - I promised to tell you about std::stod.

    Next to the global c++ locale there is so called (global) C locale (comes from the C language and not to be confused with the classical "C" locale). Each time we changed the global c++ locale the C locale has been changed too.

    Getting/setting of the C locale can be done with std::setlocale(...). To query the current value run:

    std::cout<<"(global) C locale is "<<std::setlocale(LC_ALL,NULL)<<"\n";
    

    to see (global) C locale is C.To set the C locale run:

      assert(std::setlocale(LC_ALL,"de_DE.utf8")!=NULL);
      std::cout<<"(global) C locale is "<<std::setlocale(LC_ALL,NULL)<<"\n";
    

    which yields (global) C locale is de_DE.utf8. But what is now the global c++ locale?

    std::cout<<"global c++ state: "<<std::locale().name()<<"\n";
    

    As you may expect, C knows nothing about c++ global locale and leaves it unchanged: global c++ state: C.

    Now we are not in Kansas any more! The old c-functions would use the C-locale and new c++ function the global c++. Brace yourself for funny debugging!

    What would you expect

    std::cout<<"C: "<<std::stod("1.1")<<" vs. DE :"<<std::stod("1,1")<<"\n";
    

    to do? std::stod is a brand-new c++11 function after all and it should use global c++ locale! Think again...:

    1 vs. 1.1
    

    It gets the German format right, because the C-locale is set to 'de_DE.utf8' and it uses old C-style functions under the hood.

    Just for the sake of completeness, the std::streams use the global c++ locale:

      std::stringstream stream;//creating with global c++ locale
      stream<<1.1;
      std::cout<<"I'm still in 'C' format: "<<stream.str()<<"\n";
    

    gives you: I'm still in 'C' format: 1.1.

    Edit: An alternative method to parse string without messing with global locale or be disturbed by it:

    bool s2d(const std::string &str, double  &val, const std::locale &loc=std::locale::classic()){
    
      std::stringstream ss(str);
      ss.imbue(loc);
      ss>>val;
      return ss.eof() && //all characters interpreted
             !ss.fail(); //nothing went wrong
    }
    

    The following tests shows:

      double d=0;
      std::cout<<"1,1 parsed with German locale successfully :"<<s2d("1,1", d, std::locale("de_DE.utf8"))<<"\n";
      std::cout<<"value retrieved: "<<d<<"\n\n";
    
      d=0;
      std::cout<<"1,1 parsed with Classical locale successfully :"<<s2d("1,1", d, std::locale::classic())<<"\n";
      std::cout<<"value retrieved: "<<d<<"\n\n";
    
      d=0;
      std::cout<<"1.1 parsed with German locale successfully :"<<s2d("1.1", d, std::locale("de_DE.utf8"))<<"\n";
      std::cout<<"value retrieved: "<<d<<"\n\n";
    
      d=0;
      std::cout<<"1.1 parsed with Classical locale successfully :"<<s2d("1.1", d, std::locale::classic())<<"\n";
      std::cout<<"value retrieved: "<<d<<"\n\n";
    

    That only the first and the last conversions are successful:

    1,1 parsed with German locale successfully :1
    value retrieved: 1.1
    
    1,1 parsed with Classical locale successfully :0
    value retrieved: 1
    
    1.1 parsed with German locale successfully :0
    value retrieved: 11
    
    1.1 parsed with Classical locale successfully :1
    value retrieved: 1.1
    

    std::stringstream may be not the fastest but has its merits...