SO I have a dataframe made up of thousands of records that I have imported from .csv. One variable within the dataframe is a free text field dervied from a lexicon. The rows of data are in the below format.
Please note that the below are not vectors but rows of char data within a variable 'date' (they just happen to look exactly like a vector):
c("9th november 2018", "27th october 2018"),
c("three months", "6 months"),
c("24th december ", "2th january 2019", "25th january 2019")
essentially all that I am interested in doing is taking the string from the first set of quotation marks and removing the rest, so:
c("9th november 2018", "27th october 2018")
9th november 2018
I am using the following code but it is taking the string from the last set of quotation marks:
LexiDate3$finaldat3 <- sub('.*,"*(.*?) *" *', '\\1', LexiDate3$Date_new)
which returns:
27th october 2018")
Not ideal and for the life of me cant figure this one out. Any help would be greatly appreciated guys.
Thanks.
How does this look? Note the quotes around the output are put there by the print method and not embedded in the string.
library(stringr)
test <- 'c("9th november 2018", "27th october 2018"),'
str_extract(test,'(?<=")(.*?)(?=")')
#> [1] "9th november 2018"
Created on 2019-02-21 by the reprex package (v0.2.1)