rpython-2.7subsetcolumn-sum

Subsetting a dataframe using column sum in Python


I have a sparse data frame with more than 500 columns. I want to remove the columns having sum of entries less than a threshold value say 100. How can I do this in Python?

In R I can achieve this using:

df2 <- df51[,colSums(df51) >= 100]  

Solution

  • In python, that translates to

    df2 = df1.drop(df1.columns[df1.sum() >= 100], axis=1)
    

    The axis=1 option is for dropping columns while axis=0 is for rows.