Hi and thanks for reading me. I have a data frame that have the next structure:
# A tibble: 6 × 2
customerId transactionHistory
<dbl> <chr>
1 40895 [{'transactionId': 0, 'date': '2024-05-04', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}, {'trans…
2 42966 [{'transactionId': 0, 'date': '2024-05-21', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}, {'trans…
3 51806 [{'transactionId': 0, 'date': '2024-04-22', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}, {'trans…
4 53366 [{'transactionId': 0, 'date': '2024-05-23', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}, {'trans…
5 53632 [{'transactionId': 0, 'date': '2024-04-24', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}, {'trans…
6 53871 [{'transactionId': 0, 'date': '2024-04-18', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}]
And I want to unnest the data to be able on get a bigger table with all the transactions in a properly data frame format, but I dont be able to find a way to do that, because the transactionHistory
column is a string, not a json.
Does anyone knows a way to do that?
Here is one possible approach which first replace single quotes in the JSON string by double quotes (see Single vs double quotes in JSON), then uses lapply
and jsonlite::fromJSON
to convert to a data.frame and finally unnest
s the column of transaction histories.
Using a minified reproducible example:
dat <- tibble::tibble(
customerId = c(40895, 42966),
transactionHistory = c(
"[{'transactionId': 0, 'date': '2024-05-04', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}, {'transactionId': 1, 'date': '2024-05-04', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}]",
"[{'transactionId': 2, 'date': '2024-05-04', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}, {'transactionId': 3, 'date': '2024-05-04', 'status': 'SUCCESSFUL', 'transactionAmount': '$40.00'}]"
)
)
dat
#> # A tibble: 2 × 2
#> customerId transactionHistory
#> <dbl> <chr>
#> 1 40895 [{'transactionId': 0, 'date': '2024-05-04', 'status': 'SUCCESSFUL'…
#> 2 42966 [{'transactionId': 2, 'date': '2024-05-04', 'status': 'SUCCESSFUL'…
library(dplyr, warn = FALSE)
library(tidyr)
library(jsonlite)
dat |>
mutate(
# Replace ' by \"
transactionHistory = gsub("\\'", "\"", transactionHistory),
# Convert to dataframe
transactionHistory = lapply(transactionHistory, jsonlite::fromJSON)
) |>
tidyr::unnest(transactionHistory)
#> # A tibble: 4 × 5
#> customerId transactionId date status transactionAmount
#> <dbl> <int> <chr> <chr> <chr>
#> 1 40895 0 2024-05-04 SUCCESSFUL $40.00
#> 2 40895 1 2024-05-04 SUCCESSFUL $40.00
#> 3 42966 2 2024-05-04 SUCCESSFUL $40.00
#> 4 42966 3 2024-05-04 SUCCESSFUL $40.00