I am cleaning a pandas dataframe imported from a .csv. It has useful data in the first and second columns, then junk in columns 3-5. This pattern repeats where every 5th column starting from the first and second columns are useful, and every 5th column starting from the third through fifth are junk. I can remove the junk columns using the code below:
df1 = df.drop(columns=df.columns[4::5])
df1 = df1.drop(columns=df1.columns[3::4])
df1 = df1.drop(columns=df1.columns[2::3])
Is there a solution to do this all in one line?
I think three lines is fine. The code won't get any clearer or faster from putting it all on one line.
Of course, you can always do:
columns = df.columns[:]
df1 = df.drop(columns=columns[4::5]).drop(columns=columns[3::5]).drop(columns=columns[2::5])
which I think also makes it clearer you intend to drop the fifth, fourth and third column every five columns.