jsonrrjsonrjsonio

Problems reading JSON file in R


I have a JSON file (an export from mongoDB) that I'd like to load into R. The document is about 890 MB in size with roughly 63,000 rows of 12 fields. The fields are numeric, character and date. I'd like to end up with a 63000 x 12 data frame.

lines <-  readLines("fb2013.json")

result: jFile has all 63,000 elements in char class and all fields are lumped into one field.

Each file looks something like this:

"{ \"_id\" : \"10151271769737669\", \"comments_count\" : 36, \"created_at\" : { \"$date\" : 1357941938000 }, \"icon\" : \"http://blahblah.gif\", \"likes_count\" : 450, \"link\" : \"http://www.blahblahblah.php\", \"message\" : \"I wish I could figure this out!\", \"page_category\" : \"Computers\", \"page_id\" : \"30968999999\", \"page_name\" : \"NothingButTrouble\", \"type\" : \"photo\", \"updated_at\" : { \"$date\" : 1358210153000 } }"

Using rjson,

jFile <- fromJSON(paste(readLines("fb2013.json"), collapse=""))

only the first row is read into jFile but there are 12 fields.

Using RJSONIO:

jFile <- fromJSON(lines)

results in the following:

Warning messages:
1: In if (is.na(encoding)) return(0L) :
  the condition has length > 1 and only the first element will be used

Again, only the first row is read into jFile and there are 12 fields.

The output from rjson and RJSONIO looks something like this:

$`_id`
[1] "1018535"

$comments_count
[1] 0

$created_at
       $date 
1.357027e+12 

$icon
[1] "http://blah.gif"

$likes_count
[1] 20

$link
[1] "http://www.chachacha"

$message
[1] "I'd love to figure this out."

$page_category
[1] "Internet/software"

$page_id
[1] "3924395872345878534"

$page_name
[1] "Not Entirely Hopeless"

$type
[1] "photo"

$updated_at
       $date 
1.357027e+12 

Solution

  • try

    library(rjson)
    path <- "WHERE/YOUR/JSON/IS/SAVED"
    c <- file(path, "r")
    l <- readLines(c, -1L)
    json <- lapply(X=l, fromJSON)