What is the pandas equivalent of the R function %in% ?
When we have a dataframe in R, we can check for which rows a column contains strings from a list using the operator %in%
which gives a Boolean output.
Concrete example: If we want to check which rows the strings "setosa" and "virginica" are in the column species
of the iris
dataset, we can simply use the following code:
iris[:,c('species')] %in% c('setosa', 'virginica')
.
How can we do the same thing in python for a pandas
DataFrame?
The reason I want to do this is I want to filter the dataset and only keep rows with the species "setosa" or "virginica".
The pandas
package has the .str
method for columns that are strings and the .str
method itself contains the .isin()
method which is equivalent to the %in%
operator in R. Further, as pointed out by @rhug123 the .isin
method can be directly applied on a series. I have made the corresponding change to the code below.
Your R code above can be implemented in python using pandas
as follows - assuming that iris
is a pandas DataFrame:
iris.species.isin(['setosa', 'virginica'])
You can then filter your DataFrame and only keep the rows with species 'setosa' or 'virginica' as follows:
iris[iris.species.isin(['setosa', 'virginica'])]