pythonpandas

Create multiple rows based on column


I have a column in my dataframe that is called delivery period. The delivery period is supposed to be in the format 'Month Year' (January 2025). It shows as the string "2025-01-01 to 2025-12-31".

I need to identify where this occurs and create a new row for each month with the same data. For instance:

Price Curve     Text            Delivery Period
TEST            TEST        2025-01-01 to 2025-12-31
some curve      some text       June 2025
some curve      some text       July 2025
some curve      some text       August 2025
some curve      some text       September 2025
some curve      some text       October 2025
some curve      some text       November 2025
some curve      some text       December 2025 

to

Price Curve       Text            Delivery Period
TEST              TEST              January 2025
TEST              TEST              February 2025
TEST              TEST              March 2025
TEST              TEST              April 2025
TEST              TEST              May 2025
TEST              TEST              June 2025
TEST              TEST              July 2025
TEST              TEST              August 2025
TEST              TEST              September 2025
TEST              TEST              October 2025
TEST              TEST              November 2025
TEST              TEST              December 2025
some curve      some text           June 2025
some curve      some text           July 2025
some curve      some text           August 2025
some curve      some text           September 2025
some curve      some text           October 2025
some curve      some text           November 2025
some curve      some text           December 2025 

MRE:

data_dict_cols = {
    'Price Curve': [
        "TEST",
        "some curve",
        "some curve",
        "some curve",
        "some curve",
        "some curve",
        "some curve",
        "some curve"
    ],
    'Text': [
        "TEST",
        "some text",
        "some text",
        "some text",
        "some text",
        "some text",
        "some text",
        "some text"
    ],
    'Delivery Period': [
        "2025-01-01 to 2025-12-31",
        "June 2025",
        "July 2025",
        "August 2025",
        "September 2025",
        "October 2025",
        "November 2025",
        "December 2025"
    ]
}

df = pd.DataFrame(data_dict_cols)

Solution

  • You can identify the row with range and transform the range into the same format as "Delivery Period".

    mask = df['Delivery Period'].str.contains(' to ')
    
    # create a period range where it has date range
    df.loc[mask, 'range'] = df.loc[mask]['Delivery Period'].str.split(' to ')
    df.loc[mask, 'range'] = df.loc[mask].range.transform(lambda x: pd.period_range(x[0], x[1], freq='M'))
    
    # explode the period range and format to MMM yyyy
    df = df.explode('range')
    df['range'] = df.range.dt.strftime('%B %Y')
    
    # combine the new and original column 
    df['Delivery Period'] = df.range.combine_first(df['Delivery Period']).drop(columns='range')