rgsub

Removing ★ symbol from a column?


Attempting to remove ★ symbol from a column to no avail.

dput(star$fifa21_raw_data.W.F[1:20])

c("4 ★", "4 ★", "3 ★", "5 ★", "5 ★", "4 ★", "4 ★", "3 ★", "3 ★", "4 ★", "3 ★", "4 ★", "3 ★", "3 ★", "4 ★", "4 ★", "3 ★", "4 ★", "3 ★", "4 ★")

I looked everywhere and feel as though gsub will do the trick here so I tried:

gsub('[★]([0-9]+)','\\1\\2', star$fifa21_raw_data.W.F)

and R printed the same result:

[1] "4 ★" "4 ★" "3 ★" "5 ★" "5 ★" "4 ★" "4 ★" "3 ★" "3 ★" "4 ★" [11] "3 ★" "4 ★" "3 ★" "3 ★" "4 ★" "4 ★" "3 ★" "4 ★" "3 ★" "4 ★"

I am at a loss. Thank you for assisting me in this.


Solution

  • Besides that your pattern [★]([0-9]+) does not match the example data, note that there is just a single capture group for the digits so there is no \\2.

    If you want to match the format in your example data, you can start by capturing a single digit and match the spaces after it:

    ([0-9])\\s+★
    

    Which will match:

    Use group 1 in the replacement.

    items <- c("4 ★", "4 ★", "3 ★", "5 ★", "5 ★", "4 ★", "4 ★", "3 ★", "3 ★", "4 ★", "3 ★", "4 ★", "3 ★", "3 ★", "4 ★", "4 ★", "3 ★", "4 ★", "3 ★", "4 ★")
    result <- gsub("([0-9])\\s+★", "\\1", items)
    
    print(result)
    

    Output

    [1] "4" "4" "3" "5" "5" "4" "4" "3" "3" "4" "3" "4" "3" "3" "4" "4" "3" "4" "3" "4"