I'm working with the Asana Python API to pull a list of tasks and stories, which are returned as dictionary objects. There is a 1:many relationship between tasks and stories. I want to create a dataframe which has each story as one record, but with the addition of the parent task gid (unique identifier). This will let me join tasks and stories to build my report.
I've created a nested loop which allows me to retrieve and print the stories associated with each task id. But rather than print the records, what I want is to build a new dataframe where each row has the parent task ID and one associated story. Nothing I've tried so far can produce that.
# get dictionary with all tasks associated with project & load into df
tasks = client.tasks.find_by_project("project id")
tasksDF = pd.DataFrame.from_dict(tasks, orient='columns')
# iterate through task list and retrieve all associated stories
for ind in tasksDF.index:
print(tasksDF['gid'][ind])
storiesDF = pd.DataFrame.from_dict(client.stories.find_by_task(tasksDF['gid'][ind]))
for story in storiesDF.index:
print(storiesDF)
Edit: here is the format of the first input dataframe tasksDF:
gid name
0 1202691502803495 Improve emails for qualification, post-order...
1 1202691502803501 Focus on adopt-expand suggestions on website
2 1202691502803507 Amplify community member voices
Here is the format of the second input dataframe storiesDF:
gid created_at resource_type
0 1202691619765536 2022-07-29T19:22:59.755Z story
1 1202691619854442 2022-07-29T19:23:06.663Z story
2 1202691619974673 2022-07-29T19:23:11.636Z story
3 1202691516738152 2022-07-29T19:24:09.938Z story
So each record in tasksDF is the parent to one or more stories in storiesDF. I want to create a new dataframe which is the same as storiesDF, but with the addition of a column containing the matching parent gid from tasksDF.
When I print using the code above, I get the elements I want (parent gid followed by multiple stories) but I can't figure out how to combine to build the dataframe I want.
1202691502803501
gid created_at resource_type
0 1202691619231667 2022-07-29T19:22:37.677Z story
1 1202691568061156 2022-07-29T19:22:49.486Z story
2 1202691516700419 2022-07-29T19:24:02.564Z story
3 1202824731913557 2022-08-18T22:20:03.991Z story
As stated above, I can't merge (or join) two dataframes since they had no shared columns to join on. I thought about this some more and used pd.at to locate specific cell values, and pd.append to build the new dataframe inside the inner loop.
# Get all the tasks for a given project and save to df
tasks = client.tasks.find_by_project("12345678")
tasksDF = pd.DataFrame.from_dict(tasks, orient='columns')
# Create an empty dataframe to hold the stories
reportDF = pd.DataFrame(columns={'TaskGID','TaskName','StoryGID','StoryAction','StoryText'})
# Loop through the tasks, retrieving stories for each one
for task in tasksDF.index:
# first save this Task ID as a single variable
taskID = tasksDF.at[task,'gid']
# then pull all the stories for that task ID
storiesDF = pd.DataFrame.from_dict(client.stories.find_by_task(taskID))
# then loop through the stories, saving each one to the dataframe including the parent Task ID
for story in storiesDF.index:
reportDF = reportDF.append({'TaskGID':taskID, 'TaskName':tasksDF.at[task,'name'], 'StoryGID':storiesDF.at[story,'gid'], \
'StoryAction':storiesDF.at[story,'resource_subtype'],'StoryText':storiesDF.at[story,'text'], 'Created':storiesDF.at[story,'created_at']}, \
ignore_index=True)
Now I hold the taskID value while I find all the matching stories, then write out one record per story including the parent taskID.
Probably neater ways to do this but it works for me. This is just the basic output showing the 'merge' - I added more fields in the code above:
TaskGID StoryGID
0 1202691502803501 1202691619231667
1 1202691502803501 1202691568061156
2 1202691502803501 1202691516700419
3 1202691502803501 1202824731913557
4 1202691502803507 1202691569029125
5 1202691502803507 1202691620978261
6 1202691502803507 1202691569887538
7 1202691502803507 1202824995319240