javaandroidcharacter-encodingcp1251

Converting between ISO-8559-1 and cp1251


My Android app uses an open-source library that only accepts text data in an ISO-8859-1 encoding. I have a few users from Eastern Europe who would like to enter cp1251-encoded text. This seems to be a limitation of the open-source library, as Java is fully capable of supporting these formats as well as unicode formats.

One option could be to modify the open-source library to support multiple character sets. Would it be possible to convert cp1251 to ISO-8859-1 and then back again? Since they are both 8-bit language encodings, it seems like you would be storing the same amount of data at a byte level. However, when the open-source library loads the byte data into a string with ISO-8859-1 encoding, any byte value not present in ISO-8859-1 would likely throw an exception.

I'm not a character set expert, but the fact that I can't find code samples doing this conversion leads me to believe it won't work, at least not reliably.


Solution

  • You are correct that this won't work very well at all. Most of the non-ASCII characters in CP1251 are not present in ISO8859-1. (CP1251 is Eastern European, and contains a lot of Cyrillic characters; ISO8859-1 is Western European, and contains a mix of accented Latin characters, punctuation, and symbols.) There are a few characters which are represented in both, but so few (and almost all of them are punctuation) that it probably won't do you any good.