rregexstring

Change to standard date format with different incoming date formats


I wish to change my dates to a standard format of dd-mm-yyyy but my issue is that I receive my dates in different formats, for example I wish to change the following dates, and I wish to use the earliest date as a start date not the date range.

myDates = c("25 - 27 February 2025", "25 February - 1 March 2025")

Separately this is not a problem as I can just gsub what I don't want and then change the format using the following line; format(strptime(gsub(" - [0-9]+", "", myDates[1]), "%d %B %Y"), "%d-%m-%Y")

However I'm not able to expand this to deal with both formats


Solution

  • You could extract the individual date components and then paste them back together. The stringr package is handy here:

    library(stringr)
    
    format(
      strptime(
        paste(str_extract(myDates, "\\d+"),        # Day (first 2-digit number)
              str_extract(myDates, "[A-Z][a-z]+"), # Month (first alphabet)
              str_extract(myDates, "\\d{4}")),     # Year (first 4-digit number)
        format="%d %B %Y"), "%d-%m-%Y")
    
    [1] "25-02-2025" "25-02-2025"