I'm doing my own python analytics project on compiling and analysing data from this open-data source:
I've never worked with api's or json's before.. all the info on google or YouTube video's always have an API key.. but I don't know how to get it
so far I've done this:
import requests
import pandas as pd
import time
API_KEY = requests.get('https://data.gov.ie/dataset?q=homelessness&api=true&theme=Housing&sort=metadata_modified+desc')
API_KEY.status_code
# this returns 200 which from google means connected status correct
Then I write this:
#make api call
response = API_KEY.json()
and get back errors:
JSONDecodeError Traceback (most recent call last)
~/opt/anaconda3/lib/python3.9/site-packages/requests/models.py in json(self, **kwargs)
970 try:
--> 971 return complexjson.loads(self.text, **kwargs)
972 except JSONDecodeError as e:
~/opt/anaconda3/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
345 parse_constant is None and object_pairs_hook is None and not kw):
--> 346 return _default_decoder.decode(s)
347 if cls is None:
...
...
...
About The Project: What I want it to connect to the data on homelessness in ireland (I would like the data to be easily updatable for future automatic updating but that's later on.. no idea how to do it yet). Ireland is facing a serious crisis with homelessness and housing in general and I would like to see how it has worsened over the years and visualise it. Perhaps get some insight or the scale of the issue. After I do some work with the project I will then import the data into tableau to do some more visualisation then publish my report to my LinkedIn. Perhaps you can advise me if I should be trying to work with csv's or json's?
Any help as always would be sincerely appreciated.
An API key is generally used to access private data and/or make some write operations. It's not the case here, all data are public.
What you need is to get all data from 'Homelessness Report'. You have to proceed in 3 steps:
import requests
# 1. Query the database
q_url = 'https://data.gov.ie/api/3/action/package_search?q=homelessness&rows=100'
packages = requests.get(q_url).json()['result']
# 2a. Process results
df = pd.json_normalize(packages['results'], 'resources', 'title')
# 2b. Filter results
mask = df['title'].str.startswith('Homelessness Report') & df['format'].eq('CSV')
df = df.loc[mask, ['title', 'url']]
# 3a. Download data
data = {}
for idx, row in df.iterrows():
dt = pd.to_datetime(row.title.strip('Homelessness Report '))
data[dt] = pd.read_csv(row.url)
# 4a. Merge data
out = (pd.concat(data, axis=0).droplevel(1).rename_axis('Date')
.sort_index().reset_index())
out.to_excel('Homelessness Report.xlsx', index=False)
Output:
>>> out
Date Region Total Adults ... Number of people with citizenship EEA/Uk Number of people with citizenship Non-EEA Number of Child Dependants in Families
0 2019-01-01 Mid-East 254 ... NaN NaN NaN
1 2019-01-01 Mid-West 338 ... NaN NaN NaN
2 2019-01-01 North-East 162 ... NaN NaN NaN
3 2019-01-01 North-West 52 ... NaN NaN NaN
4 2019-01-01 South-East 316 ... NaN NaN NaN
.. ... ... ... ... ... ... ...
436 2023-01-01 Mid-West 414 ... 45 32 NaN
437 2023-01-01 Midlands 148 ... 17 19 NaN
438 2023-01-01 Mid-East 437 ... 81 66 NaN
439 2023-01-01 Dublin 5946 ... 1506 1190 NaN
440 2023-01-01 West 321 ... 34 19 NaN
[441 rows x 21 columns]