pythonpandasprimavera

Insert rows in Pandas DF conditionally


I am exporting a schedule from P6, and am in the process of cleaning the data in Excel in prep for another application. I created a new column titled 'Parent', and am using that as the basis for my grouping.

The application I am going to use for this dataframe is Milestones Professional, which requires parent rows in the dataframe in order to indent properly. I have a column titled 'Parent', and any group which has a sub-parent is identified like the following: "Parent name > Sub-Parent name> etc."

import pandas as pd

#example of current dateframe

data = [['Milestone 1', 'Major Milestones', '01/01/2022', '01/02/2022'], ['Milestone 2', 'Major Milestones 2', '01/03/2022', '01/04/2022'], ['Milestone 3', 'Major Milestones>April MS', '01/05/2022', '01/06/2022'], ['Milestone 4', 'Major Milestones>April MS', '01/07/2022', '01/07/2022'], ['Milestone 5', 'Late Milestones', '01/08/2022', '01/09/2022'], ['Milestone 7', 'Late Milestones', '01/09/2022', '01/10/2022'], ['Milestones 8', 'Late Milestones>June MS', '01/13/2022', '01/14/2022'], ['Milestone 9', 'Late Milestones>June MS', '01/15/2022', '01/16/2022'], ['Milestone 10', 'Late Milestones>July MS', '01/19/2022', '01/20/2022']]

df = pd.DataFrame(data, columns=['Activity', 'Parent', 'Start', 'Finish'])

df

   Activity       Parent                     Start       Finish
0  Milestone 1    Major Milestones           01/01/2022  01/02/2022
1  Milestone 2    Major Milestones 2         01/03/2022  01/04/2022
2  Milestone 3    Major Milestones>April MS  01/05/2022  01/06/2022
3  Milestone 4    Major Milestones>April MS  01/07/2022  01/07/2022
4  Milestone 5    Late Milestones            01/08/2022  01/09/2022
5  Milestone 7    Late Milestones            01/09/2022  01/10/2022
6  Milestones 8   Late Milestones>June MS    01/13/2022  01/14/2022
7  Milestone 9    Late Milestones 9          01/15/2022  01/16/2022
8  Milestone 10   Late Milestones>July MS    01/19/2022  01/20/2022

#ideal dataframe

data_1 = [['Major Milestones', '', '01/01/2022', '01/07/2022'], ['Milestone 1', 'Major Milestones', '01/01/2022', '01/02/2022'], ['Milestone 2', 'Major Milestones', '01/03/2022', '01/04/2022'], ['April MS', '', '01/05/2022', '01/07/2022'], ['Milestone 3', 'Major Milestones>April MS', '01/05/2022', '01/06/2022'], ['Milestone 4', 'Major Milestones>April MS', '01/07/2022', '01/07/2022'], ['Late Milestones', '', '01/08/2022', '01/20/2022'], ['Milestone 5', 'Late Milestones', '01/08/2022', '01/09/2022'], ['June MS', '', '01/11/2022', '01/14/2022'],  ['Milestone 6', 'Late Milestones>June MS', '01/10/2022', '01/11/2022'], ['Milestone 7', 'Late Milestones>June MS', '01/09/2022', '01/10/2022'], ['Milestones 8', 'Late Milestones>June MS', '01/13/2022', '01/14/2022'], ['Milestone 9', 'Late Milestones>June MS', '01/15/2022', '01/16/2022'], ['July MS', '', '01/17/2022', '01/20/2022'], ['Milestone 10', 'Late Milestones>July MS', '01/19/2022', '01/20/2022']]

df_1 = pd.DataFrame(data_1, columns=['Activity', 'Parent', 'Start', 'Finish'])

df_1
       Activity                Parent                      Start       Finish
0  Major Milestones                                 01/01/2022  01/07/2022
1  Milestone 1         Major Milestones           01/01/2022  01/02/2022
2  Milestone 2         Major Milestones           01/03/2022  01/04/2022
3  April MS                                         01/05/2022  01/07/2022
4  Milestone 3         Major Milestones>April MS   01/05/2022  01/06/2022
5  Milestone 4         Major Milestones>April MS   01/07/2022  01/07/2022
6  Late Milestones                                  01/08/2022  01/20/2022
7  Milestone 5         Late Milestones            01/08/2022  01/09/2022
8  June MS                                           01/11/2022  01/14/2022
9  Milestone 6         Late Milestones>June MS     01/10/2022  01/11/2022
10 Milestone 7         Late Milestones>June MS     01/09/2022  01/10/2022
11 Milestones 8        Late Milestones>June MS     01/13/2022  01/14/2022
12 Milestone 9         Late Milestones>June MS     01/15/2022  01/16/2022
13 July MS                                           01/17/2022  01/20/2022
14 Milestone 10        Late Milestones>July MS     01/19/2022  01/20/2022

I am trying to insert new rows identifying the major parent above the different groups, plant the start and finish date for that group, and also put rows identifying the sub-parents and the start and finish dates for those groups (sorry I know that's probably worded terribly). I tried creating a condition that searches in the parent column and creates a new row with the name of the parent for each group above but with no success.

My apologies for the formatting of the previous code snippet, it didn't come over well from VS.


Solution

  • If you want the original propose format, you could use extract, slicing and concat:

    s = df['Parent'].str.extract('>(.*)$', expand=False).fillna(df['Parent'])
    out = pd.concat([s[s.ne(s.shift())].to_frame(name='Activity'), df]).sort_index()
    

    Output:

               Activity                     Parent       Start      Finish
    0  Major Milestones                        NaN         NaN         NaN
    0       Milestone 1           Major Milestones  01/01/2022  01/02/2022
    1       Milestone 2           Major Milestones  01/03/2022  01/04/2022
    2          April MS                        NaN         NaN         NaN
    2       Milestone 3  Major Milestones>April MS  01/05/2022  01/06/2022
    3       Milestone 4  Major Milestones>April MS  01/07/2022  01/07/2022
    4   Late Milestones                        NaN         NaN         NaN
    4       Milestone 5            Late Milestones  01/08/2022  01/09/2022
    5       Milestone 7            Late Milestones  01/09/2022  01/10/2022
    6           June MS                        NaN         NaN         NaN
    6      Milestones 8    Late Milestones>June MS  01/13/2022  01/14/2022
    7       Milestone 9    Late Milestones>June MS  01/15/2022  01/16/2022
    8           July MS                        NaN         NaN         NaN
    8      Milestone 10    Late Milestones>July MS  01/19/2022  01/20/2022