pythonpandasdataframeffill

Concert quartile into monthly data for each id while fill other columns by the first month of quarter


I am trying to convert quartiles into month for each unique id in pandas by filling other columns with previous row values. I have seen this one Pandas: Add data for missing months, but it's for only one id(it doesn't work for multiple id), how can we do the same thing if we have multiple id? For instance I have a data like this

import numpy as np
import pandas as pd
index = [0,1,2,3]
id = pd.Series([1,1,2,2],index= index)
price = pd.Series([20,41,61,68],index= index)
date_month = pd.Series(['2021-01','2021-04','2021-01','2021-04'],index= index)
df = pd.DataFrame(id,columns = ["id"])
df["price"] =price
df['date_month'] = date_month
df

But I want the output be like

index = [0,1,2,3,4,5, 6, 7,8,9,10,11]
id = pd.Series([1,1,1,1,1,1,2, 2, 2,2,2,2],index= index)
price = pd.Series([20,20,20, 41,41,41, 61,61, 61, 68,68,68],index= index)
date_month = pd.Series(['2021-01', '2021-02','2021-03', '2021-04', '2021-05','2021-06','2021-01', '2021-02', '2021-03','2021-04', '2021-05','2021-06'],index= index)
df = pd.DataFrame(id,columns = ["id"])
df["price"] =price
df['date_month'] = date_month
df 

FYI, there are other columns too, which we want fill by the value of the first month of the each quarter.


Solution

  • First, create a date range dataframe containing the months from the smallest month to the largest month.

    import datetime
    
    date_month = pd.to_datetime(df["date_month"])
    min_date = date_month.min()
    max_date = date_month.max() + pd.tseries.offsets.QuarterEnd()
    date_range_df = pd.DataFrame(pd.date_range(min_date, max_date, freq="M"), columns=["date"])
    date_range_df
    

    Then, create a quarter month map dataframe based on date_range_df

    def quarter2month(quarter):
        return (quarter -1) * 3 + 1
    
    quarter_month_map_df = date_range_df.assign(
        date_month = date_range_df["date"].map(lambda x:datetime.datetime.strftime(x,"%Y-%m")),
        date_quarter = date_range_df["date"].map(lambda x:f"{x.year}-{quarter2month(x.quarter):02}"),
    ).drop("date",axis=1)
    
    quarter_month_map_df
    

    The reuslt will be

     date_month date_quarter
    0 2021-01 2021-01
    1 2021-02 2021-01
    2 2021-03 2021-01
    3 2021-04 2021-04
    4 2021-05 2021-04
    5 2021-06 2021-04
    

    Finally, you can merge the quarter_month_map_df with original df

    pd.merge(df.rename({"date_month": "date_quarter"}, axis=1),
         quarter_month_map_df, on='date_quarter', how="left")