pythonselenium-webdrivergeckodriver

Python, Selenium: Take a Full-Page Screenshot as .pdf Without Page Breaks, Regardless of Page Dimensions


Currently, I see it is possible to create screenshots with Selenium. However, they are always .png files. How can I take the same style screenshot but as .pdf?

Required style: No margins; Same dimensions as current page (like a full page screenshot)
Printing the page doesn't accomplish this because of all the formatting that comes with printing.

How I currently get a screenshot:

from selenium import webdriver

# Function to find page size
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)

driver = webdriver.Firefox(options=options)
driver.get('https://www.google.com')

# Screen 
height = S('Height')
width = S('Width')

driver.set_window_size(width, height)
driver.get_screenshot_as_file(PNG_SAVEAS)

driver.close()

Solution

  • To achieve the desired result, I found a solution that was not readily available elsewhere.

    The key is to dynamically configure the width and height of the PDF page to match the content being printed. Additionally, I discovered that scaling down the result to only 1% of its original size speeds up the process significantly.

    One thing to note is that when using GeckoDriver, I encountered a bug (reference) that caused the resulting PDF to be printed with the wrong size. However, I found that multiplying the size by 2.5352112676056335 resolved the issue. It's still unclear to me why this specific constant is relevant to my answer, but without applying this fix the PDF's aspect ratio is distorted (rather than scaled down proportionally to ~39% its desired size). The distortion results in a multi-page .pdf file, which is not the intended outcome.

    This method was tested with GeckoDriver. If you are using Chrome, it is likely that you won't need the RATIO_MULTIPLIER workaround.

    from selenium import webdriver
    from selenium.webdriver.common.print_page_options import PrintOptions
    import base64
    
    # Bug in geckodriver... seems unrelated, but this wont work otherwise.
    # https://github.com/SeleniumHQ/selenium/issues/12066
    RATIO_MULTIPLIER = 2.5352112676056335
    
    # Function to find page size
    S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
    
    # Scale for PDF size. 1 for no change takes long time
    pdf_scaler = .01
    
    # Browser options. Headless is more reliable for screenshots in my exp.
    options = webdriver.FirefoxOptions()
    options.add_argument('--headless')
    
    # Lanuch webdriver, navigate to destination
    driver = webdriver.Firefox(options=options)
    driver.get('https://www.google.com')
    
    # Find full page dimensions regardless of scroll
    height = S('Height')
    weight = S('Width')
    
    # Dynamic setting of PDF page dimensions
    print_options = PrintOptions()
    print_options.page_height = (height*pdf_scaler)*RATIO_MULTIPLIER
    print_options.page_width = (weight*pdf_scaler)*RATIO_MULTIPLIER
    print_options.shrink_to_fit = True
    
    # Prints to PDF (returns base64 encoded data. Must save)
    pdf = driver.print_page(print_options=print_options)
    driver.close()
    
    # save the output to a file.
    with open('example.pdf', 'wb') as file:
        file.write(base64.b64decode(pdf))
    

    Versions used:

    geckodriver 0.31.0
    Firefox 113.0.1
    selenium==4.9.1
    Python 3.11.2
    Windows 10  
    

    Edit: it's because units here are cm, not inches. 2.5352112676056335 is conversion rate inches->cm :)