pythonpandasasana-api

How to iterate through two dataframes create new one?


I'm working with the Asana Python API to pull a list of tasks and stories, which are returned as dictionary objects. There is a 1:many relationship between tasks and stories. I want to create a dataframe which has each story as one record, but with the addition of the parent task gid (unique identifier). This will let me join tasks and stories to build my report.

I've created a nested loop which allows me to retrieve and print the stories associated with each task id. But rather than print the records, what I want is to build a new dataframe where each row has the parent task ID and one associated story. Nothing I've tried so far can produce that.

# get dictionary with all tasks associated with project & load into df
tasks = client.tasks.find_by_project("project id")
tasksDF = pd.DataFrame.from_dict(tasks, orient='columns')

# iterate through task list and retrieve all associated stories
for ind in tasksDF.index:
    print(tasksDF['gid'][ind])
    storiesDF = pd.DataFrame.from_dict(client.stories.find_by_task(tasksDF['gid'][ind]))
    for story in storiesDF.index:
        print(storiesDF)

Edit: here is the format of the first input dataframe tasksDF:

   gid               name
0  1202691502803495  Improve emails  for qualification,  post-order...
1  1202691502803501  Focus on adopt-expand  suggestions  on website 
2  1202691502803507  Amplify  community member voices

Here is the format of the second input dataframe storiesDF:

   gid               created_at                       resource_type
0  1202691619765536  2022-07-29T19:22:59.755Z         story
1  1202691619854442  2022-07-29T19:23:06.663Z         story
2  1202691619974673  2022-07-29T19:23:11.636Z         story
3  1202691516738152  2022-07-29T19:24:09.938Z         story

So each record in tasksDF is the parent to one or more stories in storiesDF. I want to create a new dataframe which is the same as storiesDF, but with the addition of a column containing the matching parent gid from tasksDF.

When I print using the code above, I get the elements I want (parent gid followed by multiple stories) but I can't figure out how to combine to build the dataframe I want.

1202691502803501
                gid                created_at resource_type
0  1202691619231667  2022-07-29T19:22:37.677Z         story
1  1202691568061156  2022-07-29T19:22:49.486Z         story
2  1202691516700419  2022-07-29T19:24:02.564Z         story
3  1202824731913557  2022-08-18T22:20:03.991Z         story

Solution

  • As stated above, I can't merge (or join) two dataframes since they had no shared columns to join on. I thought about this some more and used pd.at to locate specific cell values, and pd.append to build the new dataframe inside the inner loop.

    # Get all the tasks for a given project and save to df
    tasks = client.tasks.find_by_project("12345678")
    tasksDF = pd.DataFrame.from_dict(tasks, orient='columns')
    
    # Create an empty dataframe to hold the stories
    reportDF = pd.DataFrame(columns={'TaskGID','TaskName','StoryGID','StoryAction','StoryText'})
    
    # Loop through the tasks, retrieving stories for each one
    for task in tasksDF.index:
    #   first save this Task ID as a single variable
        taskID = tasksDF.at[task,'gid']
    #   then pull all the stories for that task ID
        storiesDF = pd.DataFrame.from_dict(client.stories.find_by_task(taskID))
    #   then loop through the stories, saving each one to the dataframe including the parent Task ID
        for story in storiesDF.index:
            reportDF = reportDF.append({'TaskGID':taskID, 'TaskName':tasksDF.at[task,'name'], 'StoryGID':storiesDF.at[story,'gid'], \
                    'StoryAction':storiesDF.at[story,'resource_subtype'],'StoryText':storiesDF.at[story,'text'], 'Created':storiesDF.at[story,'created_at']}, \
                    ignore_index=True)
    

    Now I hold the taskID value while I find all the matching stories, then write out one record per story including the parent taskID.

    Probably neater ways to do this but it works for me. This is just the basic output showing the 'merge' - I added more fields in the code above:

                TaskGID          StoryGID
    0   1202691502803501  1202691619231667
    1   1202691502803501  1202691568061156
    2   1202691502803501  1202691516700419
    3   1202691502803501  1202824731913557
    4   1202691502803507  1202691569029125
    5   1202691502803507  1202691620978261
    6   1202691502803507  1202691569887538
    7   1202691502803507  1202824995319240