cpdfmupdfbrotli

Why a PDF reader doesn't read brotli-compressed PDF?


I have written a small program that reads 1 page from input PDF, converts its into bitmap and writes to an output file.
It works fine when write_PDF_options == compress=yes - PDF writer uses deflate-compression.
If I change compression to brotli: write_PDF_options == compress=brotli it seems to be fine, but a PDF viewer shows an empty page (standard PDF-viewer in Windows 8, MS Edge in Windows 10, Firefox 115.21.0esr).

You can run the program changing comments in const char* write_PDF_options line near then main() start (in --- settings ---).

The question is: why brotli-encoded doesn't work ? Is it a rare compression method or muPDF compresses incorrectly ?

The program is:

#include <mupdf/fitz.h>

#include <mupdf/pdf.h>


int main(int argc, char **argv) {

    printf("-------------------- Test 2 (Brotli) -------------------\n");
    // ------ settings ------
    char input_pdf_fname[] = "d:\\test_pdf\\in.pdf";
    char output_pdf_fname[] = "d:\\test_pdf\\out_brotli.pdf";
    int cur_page_number = 0;    // page number to read
    //const char* write_options = "compress=yes";  // deflate - works good
    const char* write_PDF_options = "compress=brotli"; // brotli - works bad
    // -----------------------

    // total pages in PDF
    int total_pages_in_pdf;

    fz_context *ctx;
    fz_pixmap *pix;

    /* Create a context to hold the exception stack and various caches. */
    ctx = fz_new_context(NULL, NULL, FZ_STORE_UNLIMITED);
    assert(ctx != nullptr);

    /* Register the default file types to handle. */
    fz_register_document_handlers(ctx);

    /* Open the document. */
    fz_document *doc = nullptr;
    fz_try(ctx)
        doc = fz_open_document(ctx, input_pdf_fname);
    fz_catch(ctx) {
        fz_report_error(ctx);
        exit(4);
    }

    pdf_document *pdf_doc = nullptr;
    fz_try(ctx)
        pdf_doc = pdf_specifics(ctx, doc);
    fz_catch(ctx) {
        fz_report_error(ctx);
        exit(4);
    }

    /* Count the number of pages. */
    fz_try(ctx)
        total_pages_in_pdf = pdf_count_pages(ctx, pdf_doc);
    fz_catch(ctx) { assert(0); }

    printf("Total PDF pages = %d \n", total_pages_in_pdf);

    if (cur_page_number < 0 || cur_page_number >= total_pages_in_pdf)
    {
        fprintf(stderr, "page number out of range: cur=%d (total = %d)\n",
            cur_page_number, total_pages_in_pdf);
        pdf_drop_document(ctx, pdf_doc);
        fz_drop_context(ctx);
        return EXIT_FAILURE;
    }

    // -------- new PDF -----------
    fz_document_writer *w;
    // Create a new pdf document
    fz_try(ctx)
        w = fz_new_pdf_writer(ctx, output_pdf_fname, write_PDF_options);
    fz_catch(ctx) {
        fz_report_error(ctx);
        exit(4);
    }

    printf("++++++   Copying page = %d ... ++++++\n", cur_page_number);
    printf("   From : %s \n", input_pdf_fname);
    printf("   To   : %s \n", output_pdf_fname);
    printf("   PDF write options   : %s \n", write_PDF_options);

    pdf_page *page = nullptr;
    fz_try(ctx)
        page = pdf_load_page(ctx, pdf_doc, cur_page_number);
    fz_catch(ctx) {
        page = nullptr;
        assert(0);
    }

    int ppi = 72;

    float scale_factor = (float)ppi / 72.0;

    int alpha = 0;

    // --------- create pixmap from pdf ----------------
    fz_try(ctx)
        pix = pdf_new_pixmap_from_page_contents_with_usage(
                ctx, page, fz_identity, fz_device_rgb(ctx), alpha, NULL, FZ_CROP_BOX);
    fz_catch(ctx) { assert(0); } 

    // Starts new page
    fz_device *out_device = nullptr;    // output device to write a content of new page

    fz_irect rect_of_png = fz_pixmap_bbox(ctx, pix);
    fz_rect rect_of_pdf = fz_make_rect(rect_of_png.x0, rect_of_png.y0, rect_of_png.x1, rect_of_png.y1);

    fz_try(ctx)
        out_device = fz_begin_page(ctx, w, rect_of_pdf); // was: mediabox_crop
    fz_catch(ctx) { assert(0); }


    fz_rect page_bounds = pdf_bound_page(ctx, page, FZ_CROP_BOX);

    int keep_alpha = 0;
    fz_pixmap *rgb_pix = fz_convert_pixmap(ctx, pix, fz_device_rgb(ctx), NULL, NULL, fz_default_color_params, keep_alpha);
    fz_drop_pixmap(ctx, pix);

    fz_matrix img_ctm = fz_scale(page_bounds.x1 / fz_pixmap_width(ctx, rgb_pix),
                                    page_bounds.y1 / fz_pixmap_height(ctx, rgb_pix));


    // write an image to the device
    fz_image *image = fz_new_image_from_pixmap(ctx, rgb_pix, NULL);
    float alpha_for_img = 1.0f;
    fz_fill_image(ctx, out_device, image, fz_scale(page_bounds.x1, page_bounds.y1), alpha_for_img, fz_default_color_params);

    // Ends new page
    fz_end_page(ctx, w);

    fz_drop_image(ctx, image);
    fz_drop_pixmap(ctx, pix);
    pix = nullptr;

    // Close the writer
    fz_close_document_writer(ctx, w);
    fz_drop_document_writer(ctx, w);

    pdf_drop_document(ctx, pdf_doc);
    fz_drop_context(ctx);
    return EXIT_SUCCESS;
}

Solution

  • Currently PDF use of brotli is in roll-out so the PDF reader I support (MuPDF based) does not yet shown them but might later in the year.

    MuPDF have included Experimental Brotli Compression for roughly two years, but "traction" is slow in a large established archival market, and not a lot to gain except perhaps for some cases. Decompressing a PDF is still required for viewing. Less decompression time usually provides a speed gain.

    This is a PDF-(Version2.0) MuPDF Brotli header

    %PDF-2.0
    %µ¶
    
    9 0 obj
    <</Length1 191532/Length 44805/Filter/BrotliDecode>>
    

    Most applications that can support Brotli Format .PDF will be biased towards editing as seen here. Tracker Xchange Edit. Similarly, GhostScript 10 Viewer should be able to display them soon.

    enter image description here

    However, although Adobe Acrobat DC is slated as capable, and Google the Brotli Developer, no current browser (not even Adobe in Chrome nor Edge) seem to be able to view them.

    enter image description here