pythonweb-scrapingiframehtml-tbody

Convert inner tbody html content to json


I'm trying to get web data from a website, and I only need to grab inner html data from a tbody class and convert it into json for better control as well as to save the data into a file later on. I've only managed to read each element by using find_element(By.XPATH) from selenium. Is there any way to read the whole innter html tbody content then parse it to json? requests wont work since it's inside an iframe.
The website and the tbody is the scroll table with title :"Tình hình dịch cả nước", I only want the table minus the title, and the header of the table if possible.
The code for reading an element:

browser=webdriver.Firefox()
browser.get("https://covid19.gov.vn/")
time.sleep(3)
browser.switch_to.frame(browser.find_element(By.XPATH,'/html/body/div[1]/div[2]/div[3]/div/iframe'))
value=browser.find_element(By.XPATH,'/html/body/div[2]/div[1]/div/div[2]/div[1]/span[4]')
print(value.text)

Solution

  • Just call the same endpoint the page does which returns JSON.

    import requests
    import pandas as pd
    
    r = requests.get('https://static.pipezero.com/covid/data.json').json()
    location_json = r['locations']
    df = pd.DataFrame(location_json)
    print(df)