I'm having trouble understanding the way the IBM JVM's implementation of java.io.File
deals with UTF-8 on AIX on the JFS2 filesystem. I suspect there's a system property that I'm overlooking, but I have not yet been able to find it.
Let's assume I have a file named othér
(where é
is U+00E9 or UTF-8 bytes0xc3 0xa9
). The filename is encoded in UTF-8, and was created by a C program:
char filename[] = { 'o', 't', 'h', 0xc3, 0xa9, 'r', 0 };
open(filename, O_RDWR|O_CREAT, 0666);
If I create a Unicode string in Java that is representative of the filename, it fails to open it. Further, if I use File.listFiles()
in Java, it insists on treating this as a Latin1 string. For example:
String expectedName = new String(new char[] { 'o', 't', 'h', 0xe9, 'r' });
File expected = new File(expectedName);
if (expected.exists())
System.out.println(expectedName + " exists");
else
System.out.println(expectedName + " DOES NOT exist");
for (File child : new File(".").listFiles())
{
System.out.println(child.getName());
System.out.print("Chars:");
for (char c : child.getName().toCharArray())
System.out.print(" 0x" + Integer.toHexString((int)c));
System.out.println();
}
The results of this program are:
% java -Dfile.encoding=UTF8 FileTest
othér DOES NOT exist
othér
Chars: 0x6f 0x74 0x68 0xc3 0xa9 0x72
So it appears that my filenames are getting treated as Latin1. I've tried setting the file.encoding
system property to UTF8
and the client.encoding.override
system property to UTF-8
to no avail. My LANG
and LC_ALL
settings are en_US.UTF-8
:
% echo $LANG
en_US.UTF-8
% echo $LC_ALL
en_US.UTF-8
My system's "Primary Language Environment", as configured by SMIT, is "ISO8859-1". I don't really know the full impact this setting has, but I cannot change it. I suspect that if I could change this to "UTF8 English" then that may fix the problem, but since JFS2 stores filenames in Unicode and Java operates in Unicode internally, I feel like there should be a more general solution to the problem.
Is there another system property to J9 that I can set that will make force it to use UTF-8 filenames regardless of my SMIT setting?
AIX version is 5.2, Java version is IBM J9 (1.5.0), filesystem is JFS2:
rs6000% uname -a
AIX rs6000 2 5 000A9B7C4C00
rs6000% java -version
java version "1.5.0"
Java(TM) 2 Runtime Environment, Standard Edition (build pap32dev-20091106a (SR11 ))
IBM J9 VM (build 2.3, J2RE 1.5.0 IBM J9 2.3 AIX ppc-32 j9vmap3223-20091104 (JIT enabled)
J9VM - 20091103_45935_bHdSMr
JIT - 20091016_1845_r8
GC - 20091026_AA)
JCL - 20091106
rs6000% mount|grep /home
/dev/hd1 /home jfs2 Jun 27 16:02 rw,log=/dev/hd8
Update: this still occurs on Java6:
% java -version
java version "1.6.0"
Java(TM) SE Runtime Environment (build pap3260sr11-20120806_01(SR11))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 AIX ppc-32 jvmap3260sr11-20120801_118201 (JIT enabled, AOT enabled)
J9VM - 20120801_118201
JIT - r9_20120608_24176ifx1
GC - 20120516_AA)
JCL - 20120713_01
I found the answer. I really am trying to help here.
This is a blog post about your actual issue. I promise.
Try running your program with the -Dsun.jnu.encoding=UTF-8
flag set.