I would like to separate the prod_code_date column (a string of varying lengths, from one "pair" to multiple "pairs") into multiple columns based on the delimiter ":" and ",".
The ":" delimiter separates related information into product code and date of purchase - which can be thought of as paired information, while the "," delimiter separates the different pairs of information for the same product number (prod_no).
Expected intermediate result from separate_wider_delim The number of columns created should be based on the number of delimiters in the column. The column names should be code_1, date_1, code_2, date_2, code_x, date_x, etc.
Final expected result (sample data below): A long table with the following columns: prod_no, code, date, with repeated prod_no rows for as many pairs as there are in the prod_code_date column.
library(tidyverse)
# Data
df <- tibble(prod_no = 1:4, prod_code_date = c("' ZB10.90 : 2013-04-29'", "' XJ11.90 : 2016-10-20, ZB25.22 : 2013-10-16, ZB25.29 : 2011-12-06, XJ14.20 : 2022-03-23, ZB10.90 : 2022-12-16, ZB10.90 : 2011-12-06, QP50.19 : 2016-03-11, QP12.90 : 2012-01-20, MS44.9 : 2022-03-23'", "' MS34.3 : 2022-10-04, QP13.20 : 1998-05-26, QP50.13 : 2008-10-10, MS44.9 : 2017-05-16'", "' QP10.90 : 2008-08-11, QP11.90 : 2019-04-15'"))
# Attempt (failed) using separate_wider_delim() function. After which I would have pivoted the data to a long format.
intermediate_result <- df %>% separate_wider_delim(prod_code_date, delim = c(":", ","), names = c("code_1", "date_1", "code_2", "date_2", "code_x", "date_x"))
# Expected output: A long table with the following columns: prod_no, code, date, with repeated prod_no rows for as many pairs as there are in the prod_code_date column.
final_result <- tibble(prod_no = c(1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4), code = c("QP10.90", "XJ11.90", "QP25.22", "QP25.29", "XJ14.20", "QP10.90", "QP10.90", "QP50.19", "QP12.90", "MS34.3", "QP13.20", "QP50.13", "MS44.9", "QP10.90", "QP11.90"), date = c("2013-04-29", "2016-10-20", "2013-10-16", "2011-12-06", "2022-03-23", "2022-12-16", "2011-12-06", "2016-03-11", "2012-01-20", "2022-10-04", "1998-05-26", "2008-10-10", "2017-05-16", "2008-08-11", "2019-04-15"))
An approach using separate_longer_delim
and a consecutive separate_wider_delim
library(dplyr)
library(tidyr)
df %>%
separate_longer_delim(prod_code_date, ",") %>%
mutate(prod_code_date = gsub(" +|'", "", prod_code_date)) %>%
separate_wider_delim(prod_code_date, delim=":", names=c("code", "date"))
# A tibble: 16 × 3
prod_no code date
<int> <chr> <chr>
1 1 ZB10.90 2013-04-29
2 2 XJ11.90 2016-10-20
3 2 ZB25.22 2013-10-16
4 2 ZB25.29 2011-12-06
5 2 XJ14.20 2022-03-23
6 2 ZB10.90 2022-12-16
7 2 ZB10.90 2011-12-06
8 2 QP50.19 2016-03-11
9 2 QP12.90 2012-01-20
10 2 MS44.9 2022-03-23
11 3 MS34.3 2022-10-04
12 3 QP13.20 1998-05-26
13 3 QP50.13 2008-10-10
14 3 MS44.9 2017-05-16
15 4 QP10.90 2008-08-11
16 4 QP11.90 2019-04-15