pythonweb-scraping

How to extract data from graph in TABLE format in csv?


I need to scrape some data from this graph but in tabular format. Link

The problem is the structure of this graph, because it has months in the middle of the years, and I have tried some online scrapers but they consume too much time and sometimes I get distorted data.

More in detail I am using this software which I am citing because may help other people like me app

What do you suggest me to scrape and get the best results because I need to scrape a lot of these kind of graphs :(


Solution

  • The data for graph is embedded inside <script> tag, so to get them you can use next example:

    import json
    import re
    
    import pandas as pd
    import requests
    
    
    url = "https://www.instat.gov.al/en/sdgs/no-poverty/12-by-2030-reduce-at-least-by-half-the-proportion-of-men-women-and-children-of-all-ages-living-in-poverty-in-all-its-dimensions-according-to-national-definitions/121-proportion-of-population-living-below-the-national-poverty-line-by-sex-and-age/"
    
    html_text = requests.get(url).text
    
    # for map data:
    # map_data = re.search(r"mapData=(.*?);<", html_text).group(1)
    # print(map_data)
    
    graph_data = re.search(r"graphsDataJson=(.*?);<", html_text).group(1)
    graph_data = json.loads(graph_data)
    
    df = pd.DataFrame(graph_data[0]["indicatorDataValues"])
    print(df)
    

    Prints:

       year  value
    0  2017   23.7
    1  2018   23.4
    2  2019   23.0
    3  2020   21.8