python-3.xnlparabic

How to Download webpage as .mhtml


I am able to successfully open a URL and save the resultant page as a .html file. However, I am unable to determine how to download and save a .mhtml (Web Page, Single File).

My code is:

import urllib.parse, time
from urllib.parse import urlparse
import urllib.request

url = ('https://www.example.com')

encoded_url = urllib.parse.quote(url, safe='')

print(encoded_url)

base_url = ("https://translate.google.co.uk/translate?sl=auto&tl=en&u=")

translation_url = base_url+encoded_url

print(translation_url)

req = urllib.request.Request(translation_url, headers={'User-Agent': 'Mozilla/6.0'})

print(req)

response = urllib.request.urlopen(req)

time.sleep(15)

print(response)

webContent = response.read()

print(webContent)

f = open('GoogleTranslated.html', 'wb')

f.write(webContent)

print(f)

f.close

I have tried to use wget using the details captured in this question: How to download a webpage (mhtml format) using wget in python but the details are incomplete (or I am simply unabl eto understand).

Any suggestions would be helpful at this stage.


Solution

  • Did you try using Selenium with a Chrome Webdriver to save page?

    import time
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.expected_conditions import visibility_of_element_located
    from selenium.webdriver.support.ui import WebDriverWait
    import pyautogui
    
    URL = 'https://en.wikipedia.org/wiki/Python_(programming_language)'
    FILE_NAME = ''
    
    # open page with selenium
    # (first need to download Chrome webdriver, or a firefox webdriver, etc)
    driver = webdriver.Chrome()
    driver.get(URL)
    
    
    # wait until body is loaded
    WebDriverWait(driver, 60).until(visibility_of_element_located((By.TAG_NAME, 'body')))
    time.sleep(1)
    # open 'Save as...' to save html and assets
    pyautogui.hotkey('ctrl', 's')
    time.sleep(1)
    if FILE_NAME != '':
        pyautogui.typewrite(FILE_NAME)
    pyautogui.hotkey('enter')