pythonhtmlpdfqprinter

How to convert webpage into PDF by using Python


I was finding solution to print webpage into local file PDF, using Python. one of the good solution is to use Qt, found here, https://bharatikunal.wordpress.com/2010/01/.

It didn't work at the beginning as I had problem with the installation of PyQt4 because it gave error messages such as 'ImportError: No module named PyQt4.QtCore', and 'ImportError: No module named PyQt4.QtCore'.

It was because PyQt4's not installed properly. I used to have the libraries located at C:\Python27\Lib however it's not for PyQt4.

In fact, it simply needs to download from http://www.riverbankcomputing.com/software/pyqt/download (mind the correct Python version you are using), and install it to C:\Python27 (my case). That's it.

Now the scripts runs fine so I want to share it. for more options in using Qprinter, please refer to http://qt-project.org/doc/qt-4.8/qprinter.html#Orientation-enum.


Solution

  • thanks to below posts, and I am able to add on the webpage link address to be printed and present time on the PDF generated, no matter how many pages it has.

    Add text to Existing PDF using Python

    https://github.com/disflux/django-mtr/blob/master/pdfgen/doc_overlay.py

    To share the script as below:

    import time
    from pyPdf import PdfFileWriter, PdfFileReader
    import StringIO
    from reportlab.pdfgen import canvas
    from reportlab.lib.pagesizes import letter
    from xhtml2pdf import pisa
    import sys 
    from PyQt4.QtCore import *
    from PyQt4.QtGui import * 
    from PyQt4.QtWebKit import * 
    
    url = 'http://www.yahoo.com'
    tem_pdf = "c:\\tem_pdf.pdf"
    final_file = "c:\\younameit.pdf"
    
    app = QApplication(sys.argv)
    web = QWebView()
    #Read the URL given
    web.load(QUrl(url))
    printer = QPrinter()
    #setting format
    printer.setPageSize(QPrinter.A4)
    printer.setOrientation(QPrinter.Landscape)
    printer.setOutputFormat(QPrinter.PdfFormat)
    #export file as c:\tem_pdf.pdf
    printer.setOutputFileName(tem_pdf)
    
    def convertIt():
        web.print_(printer)
        QApplication.exit()
    
    QObject.connect(web, SIGNAL("loadFinished(bool)"), convertIt)
    
    app.exec_()
    sys.exit
    
    # Below is to add on the weblink as text and present date&time on PDF generated
    
    outputPDF = PdfFileWriter()
    packet = StringIO.StringIO()
    # create a new PDF with Reportlab
    can = canvas.Canvas(packet, pagesize=letter)
    can.setFont("Helvetica", 9)
    # Writting the new line
    oknow = time.strftime("%a, %d %b %Y %H:%M")
    can.drawString(5, 2, url)
    can.drawString(605, 2, oknow)
    can.save()
    
    #move to the beginning of the StringIO buffer
    packet.seek(0)
    new_pdf = PdfFileReader(packet)
    # read your existing PDF
    existing_pdf = PdfFileReader(file(tem_pdf, "rb"))
    pages = existing_pdf.getNumPages()
    output = PdfFileWriter()
    # add the "watermark" (which is the new pdf) on the existing page
    for x in range(0,pages):
        page = existing_pdf.getPage(x)
        page.mergePage(new_pdf.getPage(0))
        output.addPage(page)
    # finally, write "output" to a real file
    outputStream = file(final_file, "wb")
    output.write(outputStream)
    outputStream.close()
    
    print final_file, 'is ready.'