javaconvertersxwpf

used arabic and persian in XWPFDocument


I want to convert word to pdf with arabic letters, after a little R&D I decided to use

org.apache.poi.xwpf.converter.pdf.PdfConverter

but when I write below code , the final result is left to right , and the words Are cluttered,for example for "سعید" result in pdf is "س ع ی د" my docx is very big And not one paragraph:

public class ConvertWord {
    public static void ConvertToPDF(String docPath, String pdfPath) {
        try {
            InputStream doc = new FileInputStream(new File(docPath));
            XWPFDocument document = new 


            PdfOptions options = PdfOptions.create();
            options.fontEncoding("UTF-8")

            OutputStream out = new FileOutputStream(new File(pdfPath));

            PdfConverter.getInstance().convert(document,out,options);

        } catch (FileNotFoundException ex) {

        } catch (IOException ex) {

        }
    }

    public static void main(String[] args) {
        ConvertWord cwoWord=new ConvertWord();
        cwoWord.ConvertToPDF("D://" + "usc.docx","D://test12.pdf");

    }



   }

Solution

  • There is a workaround to fix this issue. You need to use ICU4J library. Then:

    String shaped = new StringBuilder(new ArabicShaping(ArabicShaping.LETTERS_SHAPE).shape(s))
    .reverse().toString();
    

    Although you might have some problems with Persian Unicodes. There was an issue and someone had fixed this by patching the ArabicShaping. I didn't find the link to it but here is the patched code. (I had to upload the file in my google drive so that it doesn't get deleted over time)

    Also, here is a link to the code and its difference with the main code.

    I changed the class name to PersianShaping for convenience.