pythonlistdictionary

Collecting data from nested lists and dictionaries without using repeated, nested for/if statements


I've been working a lot with monday.com's API lately. Once I receive queried data, I retrieve specific pieces to check them with other values, manipulate them, etc. Due to the complexity of the query, I often find myself having to iterate through nested lists of dictionaries with values of more nested lists of dictionaries. It's fairly easy to iterate through them to find the exact information I am looking for, but I'd like to know if there are some better practices then using list comprehensions, or multiple for/if statements.

Lets use the below as an example. The goal is to find the "items" value id where the name is equal to the "groups" title after the second dash (-).

The first "groups" title to match is 2000 Foo Bar: Mapping. That would return the id of 1234564130.

The second "groups" title is Computer 2023. That would return the id of 1234564074.

So on and so forth.

The code I have below works and is fast enough for my current situation, but I know that using double for loops can become quite slow (O(n²)). And I have 5 nested for loops. Is there a way of digging this specific information out in a more efficient way, time complexity-wise, or at least without so many nested for loops?

{'data': {'boards': [{'groups': [{'title': '123456-G123456.00 - 2000 Foo Bar: Mapping', 'items_page': {'cursor': None, 'items': [{'id': '1234564130', 'name': '2000 Foo Bar: Mapping'}, {'id': '1234564156', 'name': '2000.5 - 2000.5 Ground Model'}]}}, {'title': '123456-R12345.00 - Computer 2023', 'items_page': {'cursor': None, 'items': [{'id': '1234564074', 'name': 'Computer 2023'}, {'id': '1234564096', 'name': '3000.1 - 3000.1 Veggies'}]}}, {'title': '123456-T12345.00 - Dodge - Design', 'items_page': {'cursor': None, 'items': [{'id': '1234564028', 'name': 'Dodge - Design'}, {'id': '1234564048', 'name': '-'}]}}, {'title': 'Group Title', 'items_page': {'cursor': None, 'items': [{'id': '1234563996', 'name': 'Task 1'}]}}]}]}, 'account_id': 123456}
query_group_id = f"""
{{
    boards (ids: {my_board_id}) {{
        groups {{
            title
            items_page (limit: 50) {{
                cursor
                items {{
                    id
                    name
                }}
            }}
        }}
    }}
}}
"""
data = {'query' : query_group_id}
r = requests.post(url=apiUrl, json=data, headers=headers)
r_dict = r.json()

group_board_info = r_dict['data']['boards'][0]['groups']

# This is absoluetly disgusting
for dictionary in group_board_info:
    for k,v in dictionary.items():
        if k == 'title':
            g_name = '-'.join(v.split('-')[2:]).lstrip() # Remove leading white space
        if k == "items_page":
            for k2, v2 in v.items():
                if k2 == 'items':
                    for dictionary2 in v2:
                        vals_list = list(dictionary2.values())
                        keys_list = list(dictionary2.keys())
                        for idx, key in enumerate(keys_list):
                            if key == 'id':
                                if vals_list[1] == g_name:
                                   create_subitem_for_item(vals_list[0], 'init')

Solution

  • Firstly, you don't need all those loops, as discussed in the comments, since dicts support lookups.

    An easier way to handle nested but uniform data is with Pandas' json_normalize(). It lets you just specify the path(s) to the data you want, then you get a table out (a DataFrame).

    import pandas as pd
    
    df = pd.json_normalize(
        r_dict['data']['boards'][0]['groups'],
        ['items_page', 'items'],
        ['title'])
    print(df)
    
               id                          name                                      title
    0  1234564130         2000 Foo Bar: Mapping  123456-G123456.00 - 2000 Foo Bar: Mapping
    1  1234564156  2000.5 - 2000.5 Ground Model  123456-G123456.00 - 2000 Foo Bar: Mapping
    2  1234564074                 Computer 2023           123456-R12345.00 - Computer 2023
    3  1234564096       3000.1 - 3000.1 Veggies           123456-R12345.00 - Computer 2023
    4  1234564028                Dodge - Design          123456-T12345.00 - Dodge - Design
    5  1234564048                             -          123456-T12345.00 - Dodge - Design
    6  1234563996                        Task 1                                Group Title
    

    Then you can use Pandas' features like vectorized string methods:

    g_name = df['title'].str.split('-', n=2).str[2].str.lstrip()
    print(g_name)
    
    0    2000 Foo Bar: Mapping
    1    2000 Foo Bar: Mapping
    2            Computer 2023
    3            Computer 2023
    4           Dodge - Design
    5           Dodge - Design
    6                      NaN
    Name: g_name, dtype: object
    

    (NaN is used for missing values.)

    And you can select the elements you want:

    result = df.loc[df['name'].eq(g_name), 'id']
    print(result)
    
    0    1234564130
    2    1234564074
    4    1234564028
    Name: id, dtype: object
    

    This selection (passed to .loc) means: