[SOLVED] Separating text in r

Separating text in r

I have a data.frame that contains a column named movies_name. this column contain data as in this format: City of Lost Children, The (Cité des enfants perdus, La) (1995) I want to separate the year from the rest of the movie name without losing the text inside the brackets. to be more precise I want to create a new column holding the year and another one for the movie name alone.

I tried this approach but now I cannot gather back the movie name:

My approach

thanks

Solution

Try the function extract from tidyr(part of the tidyverse):

library(tidyverse)    
df %>%
  extract(movies_name,
          into = c("title", "year"), 
          regex = "(\\D+)\\s\\((\\d+)\\)")
                                                         title year
    1 City of Lost Children, The (Cité des enfants perdus, La) 1995
    2                                             another film 2020

How the regex works:

(\\D+): first capture group, matching one or more characters that are not digits
\\s\\(: a whitespace and an opening parenthesis (not captured)
(\\d+): second capture group, matching one or more `dìgits
\\): closing bracket (not captured)

Data 1:

df <- data.frame(
  movies_name = c("City of Lost Children, The (Cité des enfants perdus, La) (1995)",
                  "another film (2020)")
)

EDIT:

Okay, following comment, let's make this a little more complex by including a title with digits (in the title!):

Data 2:

df <- data.frame(
  movies_name = c("City of Lost Children, The (Cité des enfants perdus, La) (1995)",
                  "another film (2020)",
                  "Under Siege 2: Dark Territory (1995)")
)

Solution - actually easier than the previous one ;)

df %>%
  extract(movies_name,
          into = c("title", "year"), 
          regex = "(.+)\\s\\((\\d+)\\)")
                                                     title year
1 City of Lost Children, The (Cité des enfants perdus, La) 1995
2                                             another film 2020
3                            Under Siege 2: Dark Territory 1995