I am trying to extract certain rows based on year from my dataset, furthermore I want to substring those rows matching the following conditions, for year 2017 I want to substring the the portion before the second '-' in the statment for eg: "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email" I would want only "All-SOV-OutR-Sov_Score-18Dec.Email" and for 2018 I want to remove the portion after the '.' for eg: "IVP Program Template.IVP Email Template" I want "IVP Program Template"
I have tried using
data$col <- sub(".*:", "", data$`Email Name`)
data$col2 <- substring(data$`Email Name`, regexpr(".", data$`Email Name`) + 1)
but none of it is working and returns the statements as is, also for filtering based on year I tried using the filter function
filter(data, as.Date(data$
First Activity (EDT)) = "2017")
but it gives me syntax error
Here is the regex that should give you the desired result for 2017 values:
sub(".*?-.*?-", "", "17Q4-EMEA-All-SOV-OutR-Sov_Score-18Dec.Email")
# "All-SOV-OutR-Sov_Score-18Dec.Email"
The one for 2018 values:
sub("\\..*", "", "IVP Program Template.IVP Email Template")
# IVP Program Template
You can then apply the regex functions with ifelse
:
library(lubridate)
data$email_adj <- NA
data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2017", sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)
data$email_adj <- ifelse(year(mdy(data$`First Activity (EDT)`)) %in% "2018", sub("\\..*", "", data$`Email Name`), data$email_adj)
If you want to filter by month instead of year use the month
instaed of the year
function (in the example I only selected months from April until July):
library(lubridate)
data$email_adj <- NA
data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub(".*?-.*?-", "", data$`Email Name`), data$email_adj)
data$email_adj <- ifelse(month(mdy(data$`First Activity (EDT)`)) %in% 4:7, sub("\\..*", "", data$`Email Name`), data$email_adj)