I am trying to create a dataset in R from 3k JSON files. (This is the data: Nauman, F. (2023). Clothing Dataset for Second-Hand Fashion (Version 1) [Data set].Available at: Zenodo. https://doi.org/10.5281/zenodo.8386668 )
My goal is to have the data in R as a dataset/table so I can clean it and run some regressions. This is for a school paper, I'm quite new to R.
Here is my code:
### Reading JSON files
library(rjson)
# Instantiate the data object to hold the JSON details
master_data <- NULL
# Gather the names of the JSON file held in a folder
file_list <- list.files(path = "data/json_files")
# Loop through the list of files, read the JSON details,
# convert to data frame and append to the data object
for (i in 1:length(file_list)){
file_details <- fromJSON(file = paste0("data/json_files/",file_list[i]))
master_data <- rbind(master_data, as.data.frame(file_details))
}
# Check the data object
master_data
#This is the error I'm getting:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
the arguments imply different numbers of rows : 1, 0
#I tried bind_rows() to bind uneven rows
for (i in 1:length(file_list)){
file_details <- fromJSON(file = paste0("data/json_files/",file_list[i]))
master_data <- bind_rows(master_data, as.data.frame(file_details))
}
#that doesn't work either, same error
#I tried rbind.fill() from the package plyr,
for (i in 1:length(file_list)){
file_details <- fromJSON(file = paste0("data/json_files/",file_list[i]))
master_data <- rbind.fill(master_data, as.data.frame(file_details))
}
#that doesn't work either, same error #Any ideas will be appreaciated. Thank you!
I used the CRAN package rjsoncons.
I listed the full path to files, so I didn't need to construct them by pasting onto a base file name
file_list <- list.files(
"~/tmp/circular_fashion",
pattern = ".*json",
recursive = TRUE,
full.names = TRUE
)
I noticed that one of the files is not valid JSON. I found this by iterating through each file and trying simply to read it using j_query()
. If reading failed, I printed out the file name and error, and used 'NA' for the content.
json <- vapply(file_list, function(file) {
tryCatch({
rjsoncons::j_query(file)
}, error = function(e) {
message(file, ": ", conditionMessage(e))
NA_character_
})
}, character(1))
The output is
oct2022/2022-10-17/labels_2022_10_17_07_40_32.json: Extra comma at line 12 and column 6
and the JSON file is actually incorrect:
...
"colors": [
"red",
],
...
I removed the corrupt data from the JSON strings that I've read in
json <- json[!is.na(json)]
and then made the R character vector of JSON objects into an array-of-objects
json_array <- paste0("[", paste(json, collapse = ","), "]")
Finally, I used j_pivot
to change the array-of-objects to an R tibble
tbl <- rjsoncons::j_pivot(json_array, as = "tibble")
Here's the result:
> tbl
# A tibble: 3,052 × 22
brand brandtext category type size colors season pilling condition price
<chr> <list> <chr> <chr> <chr> <list> <chr> <int> <int> <chr>
1 Everest <chr [1]> Children Wint… "104" <chr> Winter 3 3 50-1…
2 Everest <chr [1]> Children Wint… "104" <chr> Winter 3 3 50-1…
3 Not in t… <chr [1]> Men Jack… "M " <chr> Autumn 5 5 >400
4 Everest <chr [1]> Children Jack… "146" <chr> Autumn 5 5 100-…
5 Etirel <chr [1]> Ladies Wint… "40" <chr> Winter 4 3 100-…
6 Lindex <chr [1]> Children Rain… "98" <chr> Spring 4 2 <50
7 Not in t… <chr [1]> Ladies Dress "42" <chr> Autumn 5 5 100-…
8 Not in t… <chr [1]> Men Trou… "Non… <chr> Autumn 5 4 100-…
9 Not in t… <chr [1]> Ladies Blou… "42" <chr> Spring 5 5 50-1…
10 Park Lane <chr [1]> Men Swea… "M " <chr> Autumn 5 5 100-…
# ℹ 3,042 more rows
# ℹ 12 more variables: annotator <list>, cut <list>, pattern <chr>, trend <chr>,
# smell <list>, stains <chr>, holes <list>, damage <chr>, material <chr>,
# comment <chr>, usage <chr>, weight <list>
# ℹ Use `print(n = ...)` to see more rows