I need to scrape some data from this graph but in tabular format. Link
The problem is the structure of this graph, because it has months in the middle of the years, and I have tried some online scrapers but they consume too much time and sometimes I get distorted data.
More in detail I am using this software which I am citing because may help other people like me app
What do you suggest me to scrape and get the best results because I need to scrape a lot of these kind of graphs :(
The data for graph is embedded inside <script>
tag, so to get them you can use next example:
import json
import re
import pandas as pd
import requests
url = "https://www.instat.gov.al/en/sdgs/no-poverty/12-by-2030-reduce-at-least-by-half-the-proportion-of-men-women-and-children-of-all-ages-living-in-poverty-in-all-its-dimensions-according-to-national-definitions/121-proportion-of-population-living-below-the-national-poverty-line-by-sex-and-age/"
html_text = requests.get(url).text
# for map data:
# map_data = re.search(r"mapData=(.*?);<", html_text).group(1)
# print(map_data)
graph_data = re.search(r"graphsDataJson=(.*?);<", html_text).group(1)
graph_data = json.loads(graph_data)
df = pd.DataFrame(graph_data[0]["indicatorDataValues"])
print(df)
Prints:
year value
0 2017 23.7
1 2018 23.4
2 2019 23.0
3 2020 21.8