c++parsingiostreamfacetcodecvt

parsing strings with value modifiers ('-', '%') at the end


I try to get to grips with parsing.

I have some data that comes in a de-de format with additional information at the end of the string.

I managed to get the de-de part correct but I struggle in getting the - and % parsed correctly. I read up on codecvt but I do not understand the topic.

Here is a reflection of what I understand so far and an example of what I need to do.

#include <string>
#include <locale>
#include <iostream>
#include <sstream>

using namespace std;

#define EXPECT_EQ(actual, expected) { \
    if (actual != expected) \
    { \
        cout << "expected " << #actual << " to be " << expected << " but was " << actual << endl; \
    } \
}

double parse(wstring numstr)
{
    double value;
    wstringstream is(numstr);
    is.imbue(locale("de-de"));
    is >> value;
    return value;
}

int main()
{
    EXPECT_EQ(parse(L"123"), 123); //ok
    EXPECT_EQ(parse(L"123,45"), 123.45); //ok
    EXPECT_EQ(parse(L"1.000,45"), 1000.45); //ok
    EXPECT_EQ(parse(L"2,390%"), 0.0239); //% sign at the end
    EXPECT_EQ(parse(L"1.234,56-"), -1234.56); //- sign at the end
}

The output is:

expected parse(L"2,390%") to be 0.0239 but was 2.39
expected parse(L"1.234,56-") to be -1234.56 but was 1234.56

How can I imbue my stream so that it reads the - and % sign like I need it to?


Solution

  • I'd tackle this head-on: let's get to grips with parsing here.

    You'd end up writing that somewhere anyways, so I'd forget about the need to create an (expensive) string stream first.

    Weapon Of Choice: Boost Spirit

    Note,

    • I parse the string using it's iterators directly. My code is pretty generic as to the type of floating point number used.

    • You can pretty much search replace double by e.g. boost::multiprecision::cpp_dec_float (or make it a template argument) and be parsing. Because I predict that you needed to parser decimal floating point numbers, not binary floating point numbers. You're losing accuracy in the conversion.

    UPDATE: extended sample Live On Coliru

    The Simple Grammar

    At it's core, the grammar is really simple:

    if (parse(numstr.begin(), numstr.end(), mynum >> matches['-'] >> matches['%'],
                value, sign, pct)) 
    {
        if (sign) value = -value;
        if (pct)  value /= 100;
    
        return value;
    }
    

    There you have it. Of couse, we need to define mynum so it parses the unsigned real numbers as expected:

    using namespace qi;
    real_parser<double, de_numpolicy<double> > mynum;
    

    The Magic: real_policies<>

    The documentation goes a long way to explaining how to tweak real number parsing using real_policies. Here's the policy I came up with:

    template <typename T>
        struct de_numpolicy : qi::ureal_policies<T>
    {
        //  No exponent
        template <typename It>                static bool parse_exp(It&, It const&)          { return false; } 
        template <typename It, typename Attr> static bool parse_exp_n(It&, It const&, Attr&) { return false; } 
    
        //  Thousands separated numbers
        template <typename It, typename Attr>
        static bool parse_n(It& first, It const& last, Attr& attr)
        {
            qi::uint_parser<unsigned, 10, 1, 3> uint3;
            qi::uint_parser<unsigned, 10, 3, 3> uint3_3;
    
            if (parse(first, last, uint3, attr)) {
                for (T n; qi::parse(first, last, '.' >> uint3_3, n);)
                    attr = attr * 1000 + n;
    
                return true;
            }
    
            return false;
        }
    
        template <typename It>
            static bool parse_dot(It& first, It const& last) {
                if (first == last || *first != ',')
                    return false;
                ++first;
                return true;
            }
    };
    

    Full Demo

    Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    #include <iostream>
    
    
    #define EXPECT_EQ(actual, expected) { \
        double v = (actual); \
        if (v != expected) \
        { \
            std::cout << "expected " << #actual << " to be " << expected << " but was " << v << std::endl; \
        } \
    }
    
    namespace mylib {
        namespace qi = boost::spirit::qi;
    
        template <typename T>
            struct de_numpolicy : qi::ureal_policies<T>
        {
            //  No exponent
            template <typename It>                static bool parse_exp(It&, It const&)          { return false; } 
            template <typename It, typename Attr> static bool parse_exp_n(It&, It const&, Attr&) { return false; } 
    
            //  Thousands separated numbers
            template <typename It, typename Attr>
            static bool parse_n(It& first, It const& last, Attr& attr)
            {
                qi::uint_parser<unsigned, 10, 1, 3> uint3;
                qi::uint_parser<unsigned, 10, 3, 3> uint3_3;
    
                if (parse(first, last, uint3, attr)) {
                    for (T n; qi::parse(first, last, '.' >> uint3_3, n);)
                        attr = attr * 1000 + n;
    
                    return true;
                }
    
                return false;
            }
    
            template <typename It>
                static bool parse_dot(It& first, It const& last) {
                    if (first == last || *first != ',')
                        return false;
                    ++first;
                    return true;
                }
        };
    
        template<typename Char, typename CharT, typename Alloc>
        double parse(std::basic_string<Char, CharT, Alloc> const& numstr)
        {
            using namespace qi;
            real_parser<double, de_numpolicy<double> > mynum;
    
            double value;
            bool sign, pct;
    
            if (parse(numstr.begin(), numstr.end(), mynum >> matches['-'] >> matches['%'],
                        value, sign, pct)) 
            {
                // std::cout << "DEBUG: " << std::boolalpha << " '" << numstr << "' -> (" << value << ", " << sign << ", " << pct << ")\n";
                if (sign) value = -value;
                if (pct)  value /= 100;
    
                return value;
            }
    
            assert(false); // TODO handle errors
        }
    
    } // namespace mylib
    
    int main()
    {
        EXPECT_EQ(mylib::parse(std::string("123")),       123);      // ok
        EXPECT_EQ(mylib::parse(std::string("123,45")),    123.45);   // ok
        EXPECT_EQ(mylib::parse(std::string("1.000,45")),  1000.45);  // ok
        EXPECT_EQ(mylib::parse(std::string("2,390%")),    0.0239);   // %  sign at the end
        EXPECT_EQ(mylib::parse(std::string("1.234,56-")), -1234.56); // -  sign at the end
    }
    

    If you uncomment the "DEBUG" line, it prints:

    DEBUG:  '123' -> (123, false, false)
    DEBUG:  '123,45' -> (123.45, false, false)
    DEBUG:  '1.000,45' -> (1000.45, false, false)
    DEBUG:  '2,390%' -> (2.39, false, true)
    DEBUG:  '1.234,56-' -> (1234.56, true, false)