Currently, I see it is possible to create screenshots with Selenium. However, they are always .png
files. How can I take the same style screenshot but as .pdf
?
Required style: No margins; Same dimensions as current page (like a full page screenshot)
Printing the page doesn't accomplish this because of all the formatting that comes with printing.
How I currently get a screenshot:
from selenium import webdriver
# Function to find page size
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
driver = webdriver.Firefox(options=options)
driver.get('https://www.google.com')
# Screen
height = S('Height')
width = S('Width')
driver.set_window_size(width, height)
driver.get_screenshot_as_file(PNG_SAVEAS)
driver.close()
To achieve the desired result, I found a solution that was not readily available elsewhere.
The key is to dynamically configure the width and height of the PDF page to match the content being printed. Additionally, I discovered that scaling down the result to only 1% of its original size speeds up the process significantly.
One thing to note is that when using GeckoDriver, I encountered a bug (reference) that caused the resulting PDF to be printed with the wrong size. However, I found that multiplying the size by 2.5352112676056335
resolved the issue. It's still unclear to me why this specific constant is relevant to my answer, but without applying this fix the PDF's aspect ratio is distorted (rather than scaled down proportionally to ~39% its desired size). The distortion results in a multi-page .pdf file, which is not the intended outcome.
This method was tested with GeckoDriver. If you are using Chrome, it is likely that you won't need the RATIO_MULTIPLIER
workaround.
from selenium import webdriver
from selenium.webdriver.common.print_page_options import PrintOptions
import base64
# Bug in geckodriver... seems unrelated, but this wont work otherwise.
# https://github.com/SeleniumHQ/selenium/issues/12066
RATIO_MULTIPLIER = 2.5352112676056335
# Function to find page size
S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
# Scale for PDF size. 1 for no change takes long time
pdf_scaler = .01
# Browser options. Headless is more reliable for screenshots in my exp.
options = webdriver.FirefoxOptions()
options.add_argument('--headless')
# Lanuch webdriver, navigate to destination
driver = webdriver.Firefox(options=options)
driver.get('https://www.google.com')
# Find full page dimensions regardless of scroll
height = S('Height')
weight = S('Width')
# Dynamic setting of PDF page dimensions
print_options = PrintOptions()
print_options.page_height = (height*pdf_scaler)*RATIO_MULTIPLIER
print_options.page_width = (weight*pdf_scaler)*RATIO_MULTIPLIER
print_options.shrink_to_fit = True
# Prints to PDF (returns base64 encoded data. Must save)
pdf = driver.print_page(print_options=print_options)
driver.close()
# save the output to a file.
with open('example.pdf', 'wb') as file:
file.write(base64.b64decode(pdf))
Versions used:
geckodriver 0.31.0
Firefox 113.0.1
selenium==4.9.1
Python 3.11.2
Windows 10
Edit: it's because units here are cm, not inches. 2.5352112676056335 is conversion rate inches->cm :)