pythonpandasdataframenumpy

Drop row in dataframe if equal to previous row


I have a dataframe like the one below, where I have a daily count of points for each team. However, it's a tough task to earn points and on many days the points stay the same. Since I'm turning the dataframe into a chart, I want to remove the rows where the point values are the same as that of the previous day. So in this case we keep row 0, row 1 is the same so we omit it, then keep row 2 because it's different from row 1.

row date Team A Team B
0 07-01-24 2pts 2pts
1 07-02-24 2pts 2pts
2 07-03-24 4pts 2pts

And of course, we don't want to compare the date. The idea for what I want to accomplish is something like this, but keeping it as a dataframe:

df = [
    df.iloc[[i]] for i in df.index[1:] 
    if
    any(df.iloc[:, 1:].iloc[[i]] != df.iloc[:, 1:].shift(1).iloc[[i]])
]

Solution

  • Example

    For clarity, i've prepared a slightly different example.

    import pandas as pd
    data = {'row' : [0, 1, 2, 3, 4], 
            'date': ['07-01-24', '07-02-24', '07-03-24', '07-04-24', '07-05-24'], 
            'Team A': ['2pts', '2pts', '2pts', '4pts', '2pts'], 
            'Team B': '2pts'}
    df = pd.DataFrame(data)
    

    df

       row      date Team A Team B
    0    0  07-01-24   2pts   2pts
    1    1  07-02-24   2pts   2pts <-- delete
    2    2  07-03-24   2pts   2pts <-- delete
    3    3  07-04-24   4pts   2pts
    4    4  07-05-24   2pts   2pts
    

    check your logic to make sure that the row labelled delete is deleted. If yes, try the code below

    Code

    tmp = df.filter(like='Team')
    out = tmp[tmp.ne(tmp.shift()).any(axis=1)]
    

    out

       row      date Team A Team B
    0    0  07-01-24   2pts   2pts
    1    3  07-04-24   4pts   2pts
    2    4  07-05-24   2pts   2pts