rsubstringdatefilter

Substring a statement after character matching and year


I am trying to extract certain rows based on year from my dataset, furthermore I want to substring those rows matching the following conditions, for year 2017 I want to substring the the portion before the second '-' in the statment for eg: "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email" I would want only "All-SOV-OutR-Sov_Score-18Dec.Email" and for 2018 I want to remove the portion after the '.' for eg: "IVP Program Template.IVP Email Template" I want "IVP Program Template"

I have tried using

data$col <- sub(".*:", "", data$`Email Name`)
data$col2 <- substring(data$`Email Name`, regexpr(".", data$`Email Name`) + 1)

but none of it is working and returns the statements as is, also for filtering based on year I tried using the filter function filter(data, as.Date(data$First Activity (EDT)) = "2017") but it gives me syntax error

My dataset is like this: enter image description here


Solution

  • Here is the regex that should give you the desired result for 2017 values:

    sub(".*?-.*?-", "", "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email")
    # "All-SOV-OutR-Sov_Score-18Dec.Email"
    

    The one for 2018 values:

    sub("\\..*", "", "IVP Program Template.IVP Email Template")
    # IVP Program Template
    

    You can then apply the regex functions with ifelse:

    library(lubridate)
    
    data$email_adj <- NA
    
    data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2017", sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)
    
    data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2018", sub("\\..*", "", data$`Email Name`), data$email_adj)
    
    
    

    If you want to filter by month instead of year use the month instaed of the year function (in the example I only selected months from April until July):

    library(lubridate)
    
    data$email_adj <- NA
    
    data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)
    
    data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub("\\..*", "", data$`Email Name`), data$email_adj)