rrjson

JSON file to R dataframe


I have a JSON file. While the original file is quite large, I reduced to a much smaller reproducible example for the purposes of this question (I still get the same error no matter what size):

{
  "relationships_followers": [
    {
      "title": "",
      "media_list_data": [
        
      ],
      "string_list_data": [
        {
          "href": "https://www.instagram.com/testaccount1",
          "value": "testaccount1",
          "timestamp": 1669418204
        }
      ]
    },
    {
      "title": "",
      "media_list_data": [
        
      ],
      "string_list_data": [
        {
          "href": "https://www.instagram.com/testaccount2",
          "value": "testaccount2",
          "timestamp": 1660426426
        }
      ]
    },
    {
      "title": "",
      "media_list_data": [
        
      ],
      "string_list_data": [
        {
          "href": "https://www.instagram.com/testaccount3",
          "value": "testaccount3",
          "timestamp": 1648230499
        }
      ]
    },
       {
      "title": "",
      "media_list_data": [
        
      ],
      "string_list_data": [
        {
          "href": "https://www.instagram.com/testaccount4",
          "value": "testaccount4",
          "timestamp": 1379513403
        }
      ]
    }
  ]
}

I am attempting to convert it into a dataframe in R, which contains the values for href, value, and the timestamp variables:

enter image description here

But when I run the following, which I pulled from another SO answer about converting JSON to R:

library("rjson")

result <- fromJSON(file = "test_file.json")

json_data_frame <- as.data.frame(result)

I get met with this error about differing rows.

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
  arguments imply differing number of rows: 1, 0

How can I get what I have into the desired DF format?


Solution

  • That is because there is nested data.

    df<- as.data.frame(do.call(rbind, lapply(
      lapply(result$relationships_followers, "[[", "string_list_data"), "[[", 1)))
    
    df
    #>      href                                     value          timestamp 
    #>  "https://www.instagram.com/testaccount1" "testaccount1" 1669418204
    #>  "https://www.instagram.com/testaccount2" "testaccount2" 1660426426
    #>  "https://www.instagram.com/testaccount3" "testaccount3" 1648230499
    #>  "https://www.instagram.com/testaccount4" "testaccount4" 1379513403
    

    NOTE: jsonlite package does a better job on parsing data.frame by default.