pythonhtmlquartopy-shiny

Converting or printing all sections of rendered quarto document into html in one go


I want to convert Shiny for Python document into pdf. Jumping to each section and then printing into pdf is possible. However, wondering if there is a more compact way to print all sections in a one go.


Solution

  • I can propose a solution based on wkhtmltopdf and python (to scrape the links of html files for different sections of the docs and pass them to pdfkit, a python library which is a wrapper for wkhtmltopdf utility to convert HTML to PDF.

    So at first download the wkhtmltopdf and then install this tool on your system (you may read this to get help about installation process and if you are a windows user remember to add wkhtmltopdf to PATH).

    Then check its availability from cmd/shell by,

    $ wkhtmltopdf --version
    
    # wkhtmltopdf 0.12.6 (with patched qt)
    

    Now then install these python libraries (assuming you have python installed),

    pip install requests beautifulsoup4 pdfkit
    

    and then run this python script,

    $ python html2pdf.py
    

    html2pdf.py

    
    import re
    import pdfkit
    import requests
    from bs4 import BeautifulSoup
    
    # Making a GET request
    r = requests.get('https://shiny.rstudio.com/py/docs/get-started.html')
    
    # print(r.status_code)
      
    # Parsing the HTML
    soup = BeautifulSoup(r.content, 'html.parser')
    a = soup.find_all('a', class_='sidebar-link')
    
    # get the links
    links = [link.get('href') for link in a if link.get('href') is not None]
    site_link = 'https://shiny.rstudio.com/py'
    full_links = [site_link + link[2:] for link in links]
    
    # for file names
    names = [re.findall("(?:.+\/)(.+)(?:.html)", link)[0] for link in full_links] 
    
    # convert the link of htmls to pdf
    for i, link in enumerate(full_links):
        pdfkit.from_url(link, f"{names[i]}.pdf")
    
    

    It will convert all the html files (links in the sidebar of https://shiny.rstudio.com/py/docs/) into pdf files in one go.

    $ ls
    
    get-started.pdf            reactive-programming.pdf  ui-navigation.pdf
    html2pdf.py                reactive-values.pdf       ui-page-layouts.pdf
    overview.pdf               running-debugging.pdf     ui-static.pdf
    putting-it-together.pdf    server.pdf                user-interface.pdf
    reactive-calculations.pdf  ui-dynamic.pdf            workflow-modules.pdf
    reactive-events.pdf        ui-feedback.pdf           workflow-server.pdf
    reactive-mutable.pdf       ui-html.pdf