rdplyragrep

dplyr filter function in combination with agrep


I'm trying to filter only rows from my table that have the word "dog" in the title column but I cannot get it to work.

Here's a data example:

    ID NozamaItemID                                                    NozamaTitle 
1 4557  12000017544 Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each) 
2 4558  12000021992                                        Pepsi, 8Ct, 12Oz Bottle 
3 4559  12000024542                     Zuke'S Natural Hip Action dog Treats, 3 Oz 
4 4560  12000030680                  Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans 
5 4561  12000030680                  Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans 
6 4562  12000030680                  Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans 

The following code should work but does not:

amzp <- select(amz, ID, NozamaItemID, NozamaTitle, NozamaCustomerID)

searchTerm="cat|dog"
amzp.a <- mutate(amzp, animalFood = ifelse(grepl(searchTerm, amzp$NozamaTitle, ignore.case = TRUE) == TRUE, TRUE, FALSE))

I would expect to see a TRUE for row 3. Any help is appreciated. Thanks


Solution

  • You are close, you just need to get rid off the ifelse:

    amzp.a <- mutate(amzp, animalFood = grepl(searchTerm, 
                             NozamaTitle, ignore.case = TRUE))
    

    which gives:

    > amzp.a
        ID NozamaItemID                                                     NozamaTitle animalFood
    1 4557  12000017544  Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each)      FALSE
    2 4558  12000021992                                         Pepsi, 8Ct, 12Oz Bottle      FALSE
    3 4559  12000024542                      Zuke'S Natural Hip Action dog Treats, 3 Oz       TRUE
    4 4560  12000030680                   Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans      FALSE
    5 4561  12000030680                   Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans      FALSE
    6 4562  12000030680                   Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans      FALSE
    

    Used data:

    amzp <- structure(list(ID = 4557:4562,
                           NozamaItemID = c(12000017544, 12000021992, 12000024542, 12000030680, 12000030680, 12000030680),
                           NozamaTitle = structure(c(4L, 1L, 2L, 3L, 3L, 3L), .Label = c("Pepsi, 8Ct, 12Oz Bottle","Zuke'S Natural Hip Action dog Treats, 3 Oz","Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans","Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each)"), class = "factor")),
                      .Names = c("ID", "NozamaItemID", "NozamaTitle"), class = "data.frame", row.names = c(NA, -6L))
    

    EDIT: Your original code:

    amzp.a <- mutate(amzp, animalFood = ifelse(grepl(searchTerm, amzp$NozamaTitle, ignore.case = TRUE) == TRUE, TRUE, FALSE))
    

    does actually work. Although it contains several components which are not needed (the ifelse-statement and using data$column inside a standard dplyr function), it gives the desired result:

    > amzp.a
        ID NozamaItemID                                                     NozamaTitle animalFood
    1 4557  12000017544  Starbucks Double Shot Espresso Light (4 Count, 6.5 Fl Oz Each)      FALSE
    2 4558  12000021992                                         Pepsi, 8Ct, 12Oz Bottle      FALSE
    3 4559  12000024542                      Zuke'S Natural Hip Action dog Treats, 3 Oz       TRUE
    4 4560  12000030680                   Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans      FALSE
    5 4561  12000030680                   Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans      FALSE
    6 4562  12000030680                   Pepsi Made With Real Sugar, 12 Ct, 12 Oz Cans      FALSE
    

    So, you might want to describe the "does not work" statement in more detail.