pythondatetimepandasresamplingmelt

How can I add rows for all dates between two columns?


import pandas as pd

mydata = [{'ID' : '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016'},
          {'ID' : '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016'}]

mydata2 = [{'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '10/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '11/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '12/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '13/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '14/10/2016'},
           {'ID': '10', 'Entry Date': '10/10/2016', 'Exit Date': '15/10/2016', 'Date': '15/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '10/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '11/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '12/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '13/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '14/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '15/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '16/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '17/10/2016'},
           {'ID': '20', 'Entry Date': '10/10/2016', 'Exit Date': '18/10/2016', 'Date': '18/10/2016'},]

df = pd.DataFrame(mydata)
df2 = pd.DataFrame(mydata2)

I can't find an answer on how to change 'df' into 'df2'. Maybe I'm not phrasing it right.

I want to take all dates between the dates in two columns 'Entry Date', 'Exit Date', and make a row for each, entering a corresponding date for each row in a new column, 'Date'.

Any help would be greatly appreciated.


Solution

  • You can use melt for reshaping, set_index and remove column variable:

    #convert columns to datetime
    df['Entry Date'] = pd.to_datetime(df['Entry Date'])
    df['Exit Date'] = pd.to_datetime(df['Exit Date'])
    
    df2 = pd.melt(df, id_vars='ID', value_name='Date')
    df2.Date = pd.to_datetime(df2.Date)
    df2.set_index('Date', inplace=True)
    df2.drop('variable', axis=1, inplace=True)
    print (df2)
                ID
    Date          
    2016-10-10  10
    2016-10-10  20
    2016-10-15  10
    2016-10-18  20
    

    Then groupby with resample and ffill missing values:

    df3 = df2.groupby('ID').resample('D').ffill().reset_index(level=0, drop=True).reset_index()
    print (df3)
             Date  ID
    0  2016-10-10  10
    1  2016-10-11  10
    2  2016-10-12  10
    3  2016-10-13  10
    4  2016-10-14  10
    5  2016-10-15  10
    6  2016-10-10  20
    7  2016-10-11  20
    8  2016-10-12  20
    9  2016-10-13  20
    10 2016-10-14  20
    11 2016-10-15  20
    12 2016-10-16  20
    13 2016-10-17  20
    14 2016-10-18  20
    

    Last merge original DataFrame:

    print (pd.merge(df, df3))
       Entry Date  Exit Date  ID       Date
    0  2016-10-10 2016-10-15  10 2016-10-10
    1  2016-10-10 2016-10-15  10 2016-10-11
    2  2016-10-10 2016-10-15  10 2016-10-12
    3  2016-10-10 2016-10-15  10 2016-10-13
    4  2016-10-10 2016-10-15  10 2016-10-14
    5  2016-10-10 2016-10-15  10 2016-10-15
    6  2016-10-10 2016-10-18  20 2016-10-10
    7  2016-10-10 2016-10-18  20 2016-10-11
    8  2016-10-10 2016-10-18  20 2016-10-12
    9  2016-10-10 2016-10-18  20 2016-10-13
    10 2016-10-10 2016-10-18  20 2016-10-14
    11 2016-10-10 2016-10-18  20 2016-10-15
    12 2016-10-10 2016-10-18  20 2016-10-16
    13 2016-10-10 2016-10-18  20 2016-10-17
    14 2016-10-10 2016-10-18  20 2016-10-18