I'm trying to use ICU's StringCharacterIterator
to copy (and possibly alter) characters from a source string to a destination string. However, I am having unexpected results and am unsure why.
I would expect the final line of output of this program to be dog
but instead I get og￿
#include <iostream>
#include <icu4c/unicode/schriter.h>
int main()
{
UnicodeString dog = UnicodeString::fromUTF8("dog");
StringCharacterIterator chars(dog);
UnicodeString copy;
while(chars.hasNext())
copy.append(chars.next32());
for(int i=0; i<copy.countChar32(); i++)
{
int32_t charNumber = copy.char32At(i);
std::cout << charNumber << "\n";
}
std::string stdString;
copy.toUTF8String(stdString);
std::cout << stdString;
}
Program Output
111
103
65535
og￿
Unicode table
111 - latin small letter o
103 - latin small letter g
You have two problems:
StringCharacterIterator::hasNext
returns false only when the iterator is beyond the end of the string.StringCharacterIterator::next32
advances the current position of the iterator and returns the new code point. It is analogous to *(++it)
for a raw pointer or standard library style iterator.Taken together, this means you're skipping the first character of your string and reading an extra character beyond the end.
You can use next32PostInc
, which behaves like *(it++)
for a raw pointer or standard library iterator, instead of next32
:
while(chars.hasNext())
copy.append(chars.next32PostInc());