rdplyrgrepl

Using grepl with OR condition to filter string


Currently I have a df like so:

df <- data.frame(
  player = c('Player To Have 1 Or More Shots On Target', 'Player To Have 1 Or More Shots On Target', 
             'Player To Have 2 Or More Shots On Target', 'Player To Have 3 Or More Shots On Target',
             'Player To Have 1 Or More Shots On Target in 1st Half'))

Output:

                                                player
1             Player To Have 1 Or More Shots On Target
2             Player To Have 1 Or More Shots On Target
3             Player To Have 2 Or More Shots On Target
4             Player To Have 3 Or More Shots On Target
5 Player To Have 1 Or More Shots On Target in 1st Half

I would like to use grepl (or another suitable alternative) to only capture 1,2,3,4, etc. shots on target (disregarding anything else like row 5 which also contains 'in 1st Half).

In the example above, I wish to capture all of the first 4 rows (the original data has many more rows). I tried the following which works:

df2 <- dplyr::filter(df, grepl("Player To Have 1 Or More Shots On Target", player))

How can the above be ameneded to include multiple digits for the "1"? E.g. I would like to capture 1,2,3,4, etc. shots?

I tried something like:

number_of_shots <- c("1","2")
df2 <- dplyr::filter(df, grepl("Player To Have", number_of_shots, "Or More Shots On Target", player))

But I get the following error:

Error in `dplyr::filter()`:
ℹ In argument: `grepl(...)`.
Caused by error:
! `..1` must be of size 5 or 1, not size 2.

Solution

  • Regular expressions can be used

    
    df <- data.frame(
      player = c('Player To Have 1 Or More Shots On Target', 'Player To Have 1 Or More Shots On Target', 
                 'Player To Have 10 Or More Shots On Target', 'Player To Have 3 Or More Shots On Target',
                 'Player To Have 1 Or More Shots On Target in 1st Half'))
    
    # match 0-9
    df %>%
      filter(grepl('^Player To Have [0-9] Or More Shots On Target$', player))
    
    # match anything 
    df %>%
      filter(grepl('^Player To Have .* Or More Shots On Target$', player))