We are receiving EBCDIC mainframe file over XCOM in binary format. Currently, there's a legacy C-based application which is converting it to readable ASCII format. This is how the file looks like now:
As the part of migration, we have to migrate the legacy application on Java. Can you please suggest or share some link how to convert that binary file to readable format in Java?
EBCDIC - like ASCII or Latin-1 - is text. You can try one of "Cp037", "Cp500", "Cp1047"
. As there are more than one EBCDIC variant check Wikipedia or such. Unfortunately not every Charset is provided by the Java SE. See Convert String from ASCII to EBCDIC in Java?
Since java 11 you can use Files.readString/writeString, otherwise one needs to use Files.readAllBytes.
Path ebcdicPath = Paths.get("...");
Path utf8Path = ebcdicPath.resolveSibling("utf8.txt");
Charset ebcdic = Charset.forName("Cp1047");
String content = Files.readString(ebcdicPath, ebcdic);
Files.writeString(utf8Path, content, StandardCharsets.UTF_8);
You might get problems with the line endings, as in Unicode the EBCDIC originating NEL (U+0085) is a legal newline/carriage return. Using Files.lines
would string line endings.
Path path = Paths.get("...");
byte[] content = Files.readAllBytes(path);
for (int i = 0; i < 16; ++i) {
System.out.printf(" %02x", content[i] & 0xFF);
}
System.out.println();
byte[] c = {(byte)0xf0, (byte)0xf0, (byte)0xf0, (byte)0xf0, (byte)0xf0, (byte)0xf9, (byte)0xf7, (byte)0xf7,
(byte)0xf1, (byte)0xf2, (byte)0xf2, (byte)0xf0, (byte)0xf3, (byte)0xf2, (byte)0xf1, (byte)0xf0};
Charset ebcdic = Charset.forName("Cp1047");
System.out.println(new String(c, ebcdic));
0000097712203210