pythonpython-3.xpandasdataframevaex

How to filter a vaex dataset by a list of numbers/categories


As an example, I have the next dataset (fake random data) -

Index category value
1 dog 5
2 cat 22
3 Tasselled Wobbegong 44
4 cat 66
5 Tasselled Wobbegong 5
6 dog 23

I have this in a vaex dataframe. Now imagine I have 10,000 categories not only 3. I want to filter my vaex dataframe by a list of categories. like so:

filter_category_list = ['cat','dog']
df = df[df.category in filter_category_list ]

(the code above doesn't work I imagine it would be similar to this) I expect my output to be:

Index category value
1 dog 5
2 cat 22
4 cat 66
6 dog 23

Any idea how to achieve that with vaex?

Thanks for taking the time to read!


Solution

  • Here are some solutions for that.

    df.query("category in @filter_category_list")
    
    df[df['category'].apply(lambda x: x in filter_category_list)]
    
    df[df['category'].isin(filter_category_list)]