pythonrpandasiris-dataset

What is the pandas equivalent of the R function %in%?


What is the pandas equivalent of the R function %in% ?

When we have a dataframe in R, we can check for which rows a column contains strings from a list using the operator %in% which gives a Boolean output.

Concrete example: If we want to check which rows the strings "setosa" and "virginica" are in the column species of the iris dataset, we can simply use the following code:

iris[:,c('species')] %in% c('setosa', 'virginica').

How can we do the same thing in python for a pandas DataFrame?

The reason I want to do this is I want to filter the dataset and only keep rows with the species "setosa" or "virginica".


Solution

  • The pandas package has the .str method for columns that are strings and the .str method itself contains the .isin() method which is equivalent to the %in% operator in R. Further, as pointed out by @rhug123 the .isin method can be directly applied on a series. I have made the corresponding change to the code below.

    Your R code above can be implemented in python using pandas as follows - assuming that iris is a pandas DataFrame:

    iris.species.isin(['setosa', 'virginica'])

    You can then filter your DataFrame and only keep the rows with species 'setosa' or 'virginica' as follows:

    iris[iris.species.isin(['setosa', 'virginica'])]