javalinuxcommand-lineextended-ascii

Problems using extended ascii characters as parameters in linux CL


I'm trying to pass some strings as arguments to a .jar file which I'm executing using the command line in linux debian. Part of the strings are extended ascii chars like copyright symbol or the letter ü.

java -jar someJar_CL.jar arg1 arg2 'Lizenziert für foo © foobar' 

Under windows using the powershell everything works just fine. The .jar file gets executed as expected. In linux nonetheless I get the following exception:

java.lang.IllegalArgumentException: U+FFFD ('.notdef') is not available in this font Helvetica encoding: WinAnsiEncoding
        at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:426)
        at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:342)
        at org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:373)
        at watermark.app.AddWatermarkToFile.watermarkPdf(AddWatermarkToFile.java:101)
        at watermark.app.AddWatermarkToFile.watermarkPdfs(AddWatermarkToFile.java:51)
        at watermark.gui.BatchWatermarkPDFFile.main(BatchWatermarkPDFFile.java:113)

In my understanding, this exception means that the program has problems regarding the extended-ascii chars. If I remove them, it is executed correctly (in linux).

I have no direct access to the source code of the .jar file but I don't think it's necessary since it is executed correctly under windows (it's all in jre no matter what OS).

I didn't think it would be the solution but I have installed the ms fonts with apt-get install msttcorefonts. It didn't change anything.

How can I fix this issue? Does it have anything to do with the Helvetica font? Would it work with a different font in linux? It is possible for me to contact the developer of the .jar to ask for changes, but only if it is really necessary.

Thanks in advance.


Solution

  • Since PdfBox complains about U+FFFD (the Unicode replacement character), we can savely say that something went wrong before the String was given to the PdfBox library.

    The issue seems to be how Java interprets the bytes coming in via the command line (the parameters). On Linux it will use the locale information to find out how to interpret command line parameters (which the OS just provides as a un-annotated byte strings with no indication as to their encoding).

    If you don't have a locale configured then it could fall back to the POSIX locale and use ASCII encoding. You can fix this in one of two ways

    1. set up your locale (most directly the LANG environment variable) to a locale that uses UTF-8 encoding.

      You can either do this globally or just for the single invocation of java.

    2. set the sun.jnu.encoding system property to explicitly tell Java how to decode command line arguments.

      This option seems to be poorly documented and not standardized, so it might not work with non-Oracle VMs.