pythonseleniumpdf

How to download embedded PDF from webpage using selenium?


I want to download embedded PDF from a webpage using selenium just like in this image. Embedded PDF image

For example, page like this: https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html

I tried the code mentioned below but it did not work out.

def download_pdf(lnk):

    from selenium import webdriver
    from time import sleep

    options = webdriver.ChromeOptions()

    download_folder = "/*My folder*/"    

    profile = {"plugins.plugins_list": [{"enabled": False,
                                         "name": "Chrome PDF Viewer"}],
               "download.default_directory": download_folder,
               "download.extensions_to_open": ""}

    options.add_experimental_option("prefs", profile)

    print("Downloading file from link: {}".format(lnk))

    driver = webdriver.Chrome('/*Path of chromedriver*/',chrome_options = options)
    driver.get(lnk)
    imp_by1 = driver.find_element_by_id("secondaryToolbarToggle")
    imp_by1.click()
    imp_by = driver.find_element_by_id("secondaryDownload")
    imp_by.click()

    print("Status: Download Complete.")

    driver.close()

download_pdf('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')

Any help is appreciated.

Thanks in advance!!


Solution

  • Here You go, description in code:

    =^..^=

    from selenium import webdriver
    import os
    
    # initialise browser
    browser = webdriver.Chrome(os.getcwd()+'/chromedriver')
    # load page with iframe
    browser.get('https://www.sebi.gov.in/enforcement/orders/jun-2019/adjudication-order-in-respect-of-three-entities-in-the-matter-of-prism-medico-and-pharmacy-ltd-_43323.html')
    
    # find pdf url
    pdf_url = browser.find_element_by_tag_name('iframe').get_attribute("src")
    # load page with pdf
    browser.get(pdf_url)
    # download file
    download = browser.find_element_by_xpath('//*[@id="download"]')
    download.click()