In PHP
echo mb_strlen('π¦')
the result is 1
In Android Java
"π¦".length()
the result is 2
Another way to write the same char/icon
"\uD83C\uDF26".length()
the result is 2
Android Encoding
Charset defaultCharset = Charset.defaultCharset()
=> UTF=8
(new OutputStreamWriter(new ByteArrayOutputStream())).getEncoding()
=> UTF-8
File encoding is UTF-8.
Questions
Why does Android Java show a different result than mb_strlen?
I assume mb_strlen result is right, and the length is 1. How can I make Java understand the string as to calculate the length to 1?
LE:
The problem is that I have a string comming from PHP server like this:
LENGTH|STRING...
example: 5|juice3|aha3|yes
If the string contains 'π¦', example 7|sampleπ¦3|yes
then Android Java will count it as 2 instead of 1, and will parse incorrectly the string
Solution
Thank you all, the codePoint hint got me a starting point.
While looping char by char thought the text received from php:
changed int count = sb.length();
=> int count = sb.codePointCount(0, sb.length());
changed char charAt = sb.charAt(i);
to int charAt = sb.codePointAt(i);
most important
changed
String definition = sb.substring(i, i + defLength);
i += defLength - 1;
to
// +10% because maybe there are multi byte chars
StringBuilder definitionBuilder = new StringBuilder(defLength + defLength / 10);
int offset = 0;
for (int times = 0; times < defLength; times++)
{
if (sb.length() > i + offset)
{
int codepoint = sb.codePointAt(i + offset);
definitionBuilder.appendCodePoint(codepoint);
offset += Character.charCount(codepoint);
}
else
{
Debug.d("Out of bounds, i = " + i + ", offset = " + offset + ", times = " + times);
break;
}
}
String definition = definitionBuilder.toString();
i += offset - 1;
The solution is not perfect, but exemplifies the fix.
Point #4 sometimes throws OutOfBounds, but it may be wrong server data, that is why the weird handling via if (sb.length() > i + offset)