I try to get to grips with parsing.
I have some data that comes in a de-de
format with additional information at the end of the string.
I managed to get the de-de part correct but I struggle in getting the -
and %
parsed correctly. I read up on codecvt
but I do not understand the topic.
Here is a reflection of what I understand so far and an example of what I need to do.
#include <string>
#include <locale>
#include <iostream>
#include <sstream>
using namespace std;
#define EXPECT_EQ(actual, expected) { \
if (actual != expected) \
{ \
cout << "expected " << #actual << " to be " << expected << " but was " << actual << endl; \
} \
}
double parse(wstring numstr)
{
double value;
wstringstream is(numstr);
is.imbue(locale("de-de"));
is >> value;
return value;
}
int main()
{
EXPECT_EQ(parse(L"123"), 123); //ok
EXPECT_EQ(parse(L"123,45"), 123.45); //ok
EXPECT_EQ(parse(L"1.000,45"), 1000.45); //ok
EXPECT_EQ(parse(L"2,390%"), 0.0239); //% sign at the end
EXPECT_EQ(parse(L"1.234,56-"), -1234.56); //- sign at the end
}
The output is:
expected parse(L"2,390%") to be 0.0239 but was 2.39
expected parse(L"1.234,56-") to be -1234.56 but was 1234.56
How can I imbue my stream so that it reads the -
and %
sign like I need it to?
I'd tackle this head-on: let's get to grips with parsing here.
You'd end up writing that somewhere anyways, so I'd forget about the need to create an (expensive) string stream first.
Weapon Of Choice: Boost Spirit
Note,
I parse the string using it's iterators directly. My code is pretty generic as to the type of floating point number used.
You can pretty much search replace
double
by e.g.boost::multiprecision::cpp_dec_float
(or make it a template argument) and be parsing. Because I predict that you needed to parser decimal floating point numbers, not binary floating point numbers. You're losing accuracy in the conversion.UPDATE: extended sample Live On Coliru
At it's core, the grammar is really simple:
if (parse(numstr.begin(), numstr.end(), mynum >> matches['-'] >> matches['%'],
value, sign, pct))
{
if (sign) value = -value;
if (pct) value /= 100;
return value;
}
There you have it. Of couse, we need to define mynum
so it parses the unsigned real numbers as expected:
using namespace qi;
real_parser<double, de_numpolicy<double> > mynum;
real_policies<>
The documentation goes a long way to explaining how to tweak real number parsing using real_policies
. Here's the policy I came up with:
template <typename T>
struct de_numpolicy : qi::ureal_policies<T>
{
// No exponent
template <typename It> static bool parse_exp(It&, It const&) { return false; }
template <typename It, typename Attr> static bool parse_exp_n(It&, It const&, Attr&) { return false; }
// Thousands separated numbers
template <typename It, typename Attr>
static bool parse_n(It& first, It const& last, Attr& attr)
{
qi::uint_parser<unsigned, 10, 1, 3> uint3;
qi::uint_parser<unsigned, 10, 3, 3> uint3_3;
if (parse(first, last, uint3, attr)) {
for (T n; qi::parse(first, last, '.' >> uint3_3, n);)
attr = attr * 1000 + n;
return true;
}
return false;
}
template <typename It>
static bool parse_dot(It& first, It const& last) {
if (first == last || *first != ',')
return false;
++first;
return true;
}
};
#include <boost/spirit/include/qi.hpp>
#include <iostream>
#define EXPECT_EQ(actual, expected) { \
double v = (actual); \
if (v != expected) \
{ \
std::cout << "expected " << #actual << " to be " << expected << " but was " << v << std::endl; \
} \
}
namespace mylib {
namespace qi = boost::spirit::qi;
template <typename T>
struct de_numpolicy : qi::ureal_policies<T>
{
// No exponent
template <typename It> static bool parse_exp(It&, It const&) { return false; }
template <typename It, typename Attr> static bool parse_exp_n(It&, It const&, Attr&) { return false; }
// Thousands separated numbers
template <typename It, typename Attr>
static bool parse_n(It& first, It const& last, Attr& attr)
{
qi::uint_parser<unsigned, 10, 1, 3> uint3;
qi::uint_parser<unsigned, 10, 3, 3> uint3_3;
if (parse(first, last, uint3, attr)) {
for (T n; qi::parse(first, last, '.' >> uint3_3, n);)
attr = attr * 1000 + n;
return true;
}
return false;
}
template <typename It>
static bool parse_dot(It& first, It const& last) {
if (first == last || *first != ',')
return false;
++first;
return true;
}
};
template<typename Char, typename CharT, typename Alloc>
double parse(std::basic_string<Char, CharT, Alloc> const& numstr)
{
using namespace qi;
real_parser<double, de_numpolicy<double> > mynum;
double value;
bool sign, pct;
if (parse(numstr.begin(), numstr.end(), mynum >> matches['-'] >> matches['%'],
value, sign, pct))
{
// std::cout << "DEBUG: " << std::boolalpha << " '" << numstr << "' -> (" << value << ", " << sign << ", " << pct << ")\n";
if (sign) value = -value;
if (pct) value /= 100;
return value;
}
assert(false); // TODO handle errors
}
} // namespace mylib
int main()
{
EXPECT_EQ(mylib::parse(std::string("123")), 123); // ok
EXPECT_EQ(mylib::parse(std::string("123,45")), 123.45); // ok
EXPECT_EQ(mylib::parse(std::string("1.000,45")), 1000.45); // ok
EXPECT_EQ(mylib::parse(std::string("2,390%")), 0.0239); // % sign at the end
EXPECT_EQ(mylib::parse(std::string("1.234,56-")), -1234.56); // - sign at the end
}
If you uncomment the "DEBUG" line, it prints:
DEBUG: '123' -> (123, false, false)
DEBUG: '123,45' -> (123.45, false, false)
DEBUG: '1.000,45' -> (1000.45, false, false)
DEBUG: '2,390%' -> (2.39, false, true)
DEBUG: '1.234,56-' -> (1234.56, true, false)