pythonpandasdata-analysis

I want to connect to an api and extract the data


I'm doing my own python analytics project on compiling and analysing data from this open-data source:

(https://data.gov.ie/dataset?q=homeless&api=true&sort=score+desc%2C+metadata_created+desc&theme=Housing)

I've never worked with api's or json's before.. all the info on google or YouTube video's always have an API key.. but I don't know how to get it

so far I've done this:

import requests
import pandas as pd
import time

API_KEY = requests.get('https://data.gov.ie/dataset?q=homelessness&api=true&theme=Housing&sort=metadata_modified+desc')
API_KEY.status_code
# this returns 200 which from google means connected status correct

Then I write this:

#make api call
response = API_KEY.json()

and get back errors:

JSONDecodeError                           Traceback (most recent call last)
~/opt/anaconda3/lib/python3.9/site-packages/requests/models.py in json(self, **kwargs)
    970         try:
--> 971             return complexjson.loads(self.text, **kwargs)
    972         except JSONDecodeError as e:

~/opt/anaconda3/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    345             parse_constant is None and object_pairs_hook is None and not kw):
--> 346         return _default_decoder.decode(s)
    347     if cls is None:
...
...
...

About The Project: What I want it to connect to the data on homelessness in ireland (I would like the data to be easily updatable for future automatic updating but that's later on.. no idea how to do it yet). Ireland is facing a serious crisis with homelessness and housing in general and I would like to see how it has worsened over the years and visualise it. Perhaps get some insight or the scale of the issue. After I do some work with the project I will then import the data into tableau to do some more visualisation then publish my report to my LinkedIn. Perhaps you can advise me if I should be trying to work with csv's or json's?

Any help as always would be sincerely appreciated.


Solution

  • An API key is generally used to access private data and/or make some write operations. It's not the case here, all data are public.

    What you need is to get all data from 'Homelessness Report'. You have to proceed in 3 steps:

    1. Query the database
    2. Process and filter the result
    3. Download and merge data
    import requests
    
    # 1. Query the database
    q_url = 'https://data.gov.ie/api/3/action/package_search?q=homelessness&rows=100'
    packages = requests.get(q_url).json()['result']
    
    # 2a. Process results
    df = pd.json_normalize(packages['results'], 'resources', 'title')
    
    # 2b. Filter results
    mask = df['title'].str.startswith('Homelessness Report') & df['format'].eq('CSV')
    df = df.loc[mask, ['title', 'url']]
    
    # 3a. Download data
    data = {}
    for idx, row in df.iterrows():
        dt = pd.to_datetime(row.title.strip('Homelessness Report '))
        data[dt] = pd.read_csv(row.url)
    
    # 4a. Merge data
    out = (pd.concat(data, axis=0).droplevel(1).rename_axis('Date')
             .sort_index().reset_index())
    out.to_excel('Homelessness Report.xlsx', index=False)
    

    Output:

    >>> out
              Date      Region Total Adults  ... Number of people with citizenship EEA/Uk Number of people with citizenship Non-EEA  Number of Child Dependants in Families
    0   2019-01-01    Mid-East          254  ...                                      NaN                                       NaN                                     NaN
    1   2019-01-01    Mid-West          338  ...                                      NaN                                       NaN                                     NaN
    2   2019-01-01  North-East          162  ...                                      NaN                                       NaN                                     NaN
    3   2019-01-01  North-West           52  ...                                      NaN                                       NaN                                     NaN
    4   2019-01-01  South-East          316  ...                                      NaN                                       NaN                                     NaN
    ..         ...         ...          ...  ...                                      ...                                       ...                                     ...
    436 2023-01-01    Mid-West          414  ...                                       45                                        32                                     NaN
    437 2023-01-01    Midlands          148  ...                                       17                                        19                                     NaN
    438 2023-01-01    Mid-East          437  ...                                       81                                        66                                     NaN
    439 2023-01-01      Dublin         5946  ...                                     1506                                      1190                                     NaN
    440 2023-01-01        West          321  ...                                       34                                        19                                     NaN
    
    [441 rows x 21 columns]