pythonpdfjupyternbconvert

Nbconvert mode "webpdf" creates undesired new PDF pages when a Jupyter notebook runs a Python script


I've created a Python script which is quite complex but it can be run directly in any Jupyter notebook. The structure of my Jupyter notebook is:

<Cell with %run <myscript.py> >

In that script I've got some print(f'...'). However, I don't know why, when I use Nbconvert by command line to create a PDF with webpdf, I get a result like:

PAGE 1 = MarkDown1 and A LOT OF WHITE SPACE

PAGE 2 and following = all the prints from %run <myscript.py> in the right sequence

...

LAST PAGE = MarkDown2

I don't understand why the Nbconvert --to webpdf command should create a new page after running myscript.py, even if there's a lot of white space after the first MarkDown.

The print() that are integrated in myscript.py are usually one or two lines of f-string, so the renderer could, theoretically, put some of these prints just after the first MarkDown, instead of creating a new page.

I don't understand this behavior, and also I don't understand on which base webpdf decides to create a new page for printing or not.


Solution

  • webpdf target does the following

    1. converts the notebook to html
    2. uses a browser (headless chromium iirc) to "print to pdf"

    To understand better why it is rendered like this, you could use target html, open it in chrome or chromium, and open the print dialogue with ctrl-p. If you want a deeper dive, you can "inspect element", and turn on print preview mode (as explained in this question's answers)

    What is happening, I believe, is related to how browsers will lay elements out when printing. Basically (depending on the css), they will use relatively simple heuristics to find where to break pages.

    In this case, since the big block of text doesn't fit in the rest of the page, it prefers to break before that big block of text, to minimize the number of breaks inside.

    To modify this behaviour, you should add custom css, where you modify values of the break-before and break-inside properties for cells.

    Another option is to use the latex-based pdf target, which should have a more sensible handling of whitespace.