pythonpandasdatabaserowdelete-row

How can I identify changes in stock in a pandas dataframe


I am working with a pandas data frame. This data frame has 3 important columns, one is AmountOfStock, which indicates the amount of available units, other is ProductType, which is the code of the specified product, finally DateTime, indicates the date and time at which the data has been sent to the database. The database registers each 10 seconds the amount of stock of each product, thus some rows would be

1-2023-11-16 10:00:00, ProductA, 30
2-2023-11-16 10:00:00, ProductB, 15
3-2023-11-16 10:00:10, ProductA, 29
4-2023-11-16 10:00:10, ProductB, 15
5-2023-11-16 10:00:20, ProductA, 29
6-2023-11-16 10:00:20, ProductB, 14

I want to get only the rows in which the quantity of product changes or the initial values. Thus, I would be interested in removing the 4th and 5th rows. Can someone please tell me how to do this?


Solution

  • You can use group by and shift to achieve that. Here is the sample code:

    import pandas as pd
     
    data = {
        'DateTime': ['2023-11-16 10:00:00', '2023-11-16 10:00:00', '2023-11-16 10:00:10', '2023-11-16 10:00:10', '2023-11-16 10:00:20', '2023-11-16 10:00:20'],
        'ProductType': ['ProductA', 'ProductB', 'ProductA', 'ProductB', 'ProductA', 'ProductB'],
        'AmountOfStock': [30, 15, 29, 15, 29, 14]
    }
    
    df = pd.DataFrame(data)
     
    df['DateTime'] = pd.to_datetime(df['DateTime'])
    
    # Filter rows where 'AmountOfStock' changes or initial values occur
    result = df[df['AmountOfStock'] != df.groupby('ProductType')['AmountOfStock'].shift(1)]
    
    print(result)
    

    enter image description here