pythonpandasdataframepivot-tablenp.argsort

Python dataframe returning closest value above specified input in one row (pivot_table)


I have the following DataFrame, output_excel, containing inventory data and sales data for different products. See the DataFrame below:

    Product   2022-04-01  2022-05-01  2022-06-01  2022-07-01  2022-08-01  2022-09-01 AvgMonthlySales  Current Inventory
1   BE37908   1500        1400        1200        1134        1110        1004       150.208333       1500      
2   BE37907   2000        1800        1800        1540        1300        1038       189.562500       2000
3   DE37907   5467        5355        5138        4926        4735        4734       114.729167       5467

Please note that that in my example, today's date is 2022-04-01, so all inventory numbers for the months May through September are predicted values, while the AvgMonthlySales are the mean of actual, past sales for that specific product. The current inventory just displays today's value.

I also have another dataframe, df2, containing the lead time, the same sales data, and the calculated security stock for the same products. The formula for the security stock is ((leadtime in weeks / 4) + 1) * AvgMonthlySales:

     Product   AvgMonthlySales   Lead time in weeks   Security Stock
1    BE37908   250.208333        16                   1251.04166
2    BE37907   189.562500        24                   1326.9375
3    DE37907   114.729167        10                   401.552084

What I am trying to achieve:

I want to create a new dataframe, which tells me how many months are left until our inventory drops below the security stock. For example, for the first product, BE37908, the security stock is ~1251 units, and by 2022-06-01 our inventory will drop below that number. So I want to return 2022-05-01, as this is the last month where our inventories are projected to be above the security stock. The whole output should look something like this:

    Product   Last Date Above Security Stock
1   BE37908   2022-05-01
2   BE37907   2022-07-01
3   DE37907   NaN

Please also note that the timeframe for the projections (the columns) can be set by the user, so we couldn't just select columns 2 through 7. However, the Product column will always be the first one, and the AvgMonthlySales and the Current Inventory columns will always be the last two.

To recap, I want to return the column with the smallest value above the security stock for each product. I have an idea on how to do that by column using argsort, but not by row. What is the best way to achieve this? Any tips?


Solution

  • You could try as follows:

    # create list with columns with dates
    cols = [col for col in df.columns if col.startswith('20')]
    
    # select cols, apply df.gt row-wise, sum and subtract 1
    idx = df.loc[:,cols].gt(df2['Security Stock'], axis=0).sum(axis=1).sub(1)
    
    # get the correct dates from the cols
    # if the value == len(cols)-1, *all* values will have been greater so: np.nan
    idx = [cols[i] if i != len(cols)-1 else np.nan for i in idx]
    
    out = df['Product'].to_frame()
    out['Last Date Above Security Stock'] = idx
    
    print(out)
    
       Product Last Date Above Security Stock
    1  BE37908                     2022-05-01
    2  BE37907                     2022-07-01
    3  DE37907                            NaN