itexttagged-pdf

Create Accessible PDF files with iText


Recently I downloaded a trial license of iText. I try to achieve the following goals:

I tried the following code: (C#)

    LicenseKey.LoadLicenseFile(@"D:\Development\itextkey-0.xml");
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST, new WriterProperties().SetPdfVersion(PdfVersion.PDF_1_7)));
    pdfDoc.SetTagged();
    pdfDoc.GetCatalog().SetLang(new PdfString("HE-IL"));
    pdfDoc.GetCatalog().SetViewerPreferences(
            new PdfViewerPreferences().SetDisplayDocTitle(true));
    PdfDocumentInfo info = pdfDoc.GetDocumentInfo();
    info.SetTitle("iText7 PDF/UA example");
    pdfDoc.Close();

But yet, after checking at Acrobat Reader the output file marked as "Not Tagged" PDF file.

Please advise how I should use iText to achieve my goals.


Solution

  • Can't be done.

    Let me give you the easiest proof:
    Suppose the input document contains an image of two cats fighting over a ball of yarn.

    pdf/UA requires you to insert sensible alternative text for your imagines.
    There is currently no system available that is able to provide a sensible caption for any random image you throw at it.

    Not to mention that whatever system comes up with a caption for images, would have to linked to a perfect translation service. Since most image recognition services are in English, and this might not be the language you are writing documents in. Which also implies you need a system that is capable of detecting the language you are writing in.

    We've now added 3 insanely hard problems, simply to be able to handle images:

    Now imagine the other kind of fun stuff, like

    Furthermore, PDF/UA requires fonts to be embedded. What if you are faced with a PDF that uses fonts that aren't embedded. Do you have access to font programs that can be used to substitute those fonts?

    In your snippet, you use PdfReader, and you provide a path to a file SRC. You need to convert Word, PPT, and other files, but iText doesn't convert Word, PPT, etc to PDF. PdfReader only accepts PDF files (as the name indicates).