regexpandasexact-match

Pandas exact str matching function?


Does pandas have a built-in string matching function for exact matches and not regex? The code below for tropical_two has a slightly higher count. Documentation tells me it does a regex search.

tropical = reviews['description'].map(lambda x: "tropical" in x).sum()
print(tropical)
tropical_two = reviews['description'].str.count("tropical").sum()
print(tropical_two)

The first way is the answer key from Kaggle but something about it seems less readable and intuitive to me compared to a .str function because when I run this it returns True instead of 2 so I am a little confused about if the answer key method is actually counting all occurrences of "tropical" and not just the first.

def in_str(text):
    return "tropical" in text

in_str("tropical is tropical")

First 2 lines of dataframe:

 0  Italy   Aromas include tropical fruit, broom, brimston...   Vulkà Bianco    87  NaN Sicily & Sardinia   Etna    NaN Kerin O’Keefe   @kerinokeefe    Nicosia 2013 Vulkà Bianco (Etna)    White Blend Nicosia
    1   Portugal    This is ripe and fruity, a wine that is smooth...   Avidagos    87  15.0    Douro   NaN NaN Roger Voss  @vossroger  Quinta dos Avidagos 2011 Avidagos Red (Douro)   Portuguese Red  Quinta dos Avidagos

Notebook here, tropical code in cell #2 https://www.kaggle.com/mikexie0/exercise-summary-functions-and-maps


Solution

  • You may use str.count with word boundary markers to match the exact search term:

    tropical_two = reviews['description'].str.count(r'\btropical\b').sum()
    print(tropical_two)
    

    There may not be the need for a separate exact API, as str.count can be used for exact matches as well.