I have a column in my dataframe that is called delivery period. The delivery period is supposed to be in the format 'Month Year' (January 2025). It shows as the string "2025-01-01 to 2025-12-31".
I need to identify where this occurs and create a new row for each month with the same data. For instance:
Price Curve Text Delivery Period
TEST TEST 2025-01-01 to 2025-12-31
some curve some text June 2025
some curve some text July 2025
some curve some text August 2025
some curve some text September 2025
some curve some text October 2025
some curve some text November 2025
some curve some text December 2025
to
Price Curve Text Delivery Period
TEST TEST January 2025
TEST TEST February 2025
TEST TEST March 2025
TEST TEST April 2025
TEST TEST May 2025
TEST TEST June 2025
TEST TEST July 2025
TEST TEST August 2025
TEST TEST September 2025
TEST TEST October 2025
TEST TEST November 2025
TEST TEST December 2025
some curve some text June 2025
some curve some text July 2025
some curve some text August 2025
some curve some text September 2025
some curve some text October 2025
some curve some text November 2025
some curve some text December 2025
MRE:
data_dict_cols = {
'Price Curve': [
"TEST",
"some curve",
"some curve",
"some curve",
"some curve",
"some curve",
"some curve",
"some curve"
],
'Text': [
"TEST",
"some text",
"some text",
"some text",
"some text",
"some text",
"some text",
"some text"
],
'Delivery Period': [
"2025-01-01 to 2025-12-31",
"June 2025",
"July 2025",
"August 2025",
"September 2025",
"October 2025",
"November 2025",
"December 2025"
]
}
df = pd.DataFrame(data_dict_cols)
You can identify the row with range and transform the range into the same format as "Delivery Period".
mask = df['Delivery Period'].str.contains(' to ')
# create a period range where it has date range
df.loc[mask, 'range'] = df.loc[mask]['Delivery Period'].str.split(' to ')
df.loc[mask, 'range'] = df.loc[mask].range.transform(lambda x: pd.period_range(x[0], x[1], freq='M'))
# explode the period range and format to MMM yyyy
df = df.explode('range')
df['range'] = df.range.dt.strftime('%B %Y')
# combine the new and original column
df['Delivery Period'] = df.range.combine_first(df['Delivery Period']).drop(columns='range')