pandasdataframeforecastgfs

How to add new column in dataframe based on forecast cycle 00 and 12 UTC using pandas


I have a dataframe with data from two forecast cycles, one starting at 00 UTC and going up to 168 hours forecast (Forecast (valid time column) and another cycle starting at 12 UTC and also going up to 168 hours forecast. Based on dataframe that I have below, I would like to create a column called cycle that corresponds to the Date-Time column of which forecast cycle the data refers. For example:

Date-Time             Cycle
2020-07-16 00:00:00   00
2020-07-16 00:00:00   12

How can I do this?

My dataframe looks like this:

enter image description here

Array file


Solution

  • IIUC, you can use pandas.Series.where with pandas.Series.ffill :

    import numpy
    
    df = pd.read_csv("df.csv", sep=";", index_col=0, usecols=[0,1,2])
    ā€‹
    df['Date-Time'] = pd.to_datetime(df['Date-Time'])
    ā€‹
    #is it the start of the cycle ?
    m = df["Forecast (valid time)"].eq(0)
    
    df["Cycle"] = df["Date-Time"].dt.hour.where(m).ffill()
    

    Output :

    print(df.groupby("Cycle").head(5))
    
                 Date-Time  Forecast (valid time)  Cycle
    0  2020-07-16 00:00:00                    0.0      0
    1  2020-07-16 03:00:00                    3.0      0
    2  2020-07-16 06:00:00                    6.0      0
    3  2020-07-16 09:00:00                    9.0      0
    4  2020-07-16 12:00:00                   12.0      0
    57 2020-07-16 12:00:00                    0.0     12
    58 2020-07-16 15:00:00                    3.0     12
    59 2020-07-16 18:00:00                    6.0     12
    60 2020-07-16 21:00:00                    9.0     12
    61 2020-07-17 00:00:00                   12.0     12