rubyjson

When to use dump vs. generate vs. to_json and load vs. parse in Ruby's JSON lib?


david4dev's answer to this question claims that there are three equivalent ways to convert an object to a JSON string using the json library:

JSON.dump(object)
JSON.generate(object)
object.to_json

and two equivalent ways to convert a JSON string to an object:

JSON.load(string)
JSON.parse(string)

But looking at the source code, each of them seems to be pretty much different, and there are some differences between them (e.g., 1).

What are the differences among them? When to use which?


Solution

  • Summary

    In general:

    For some special use cases, you may want dump or load, but it's unsafe to use load on data you didn't create yourself.

    Extended Explanation

    JSON::dump vs JSON::generate

    As part of its argument signature, JSON::generate allows you to set options such as indent levels and whitespace particulars. JSON::dump, on the other hand, calls ::generate within itself, with specific pre-set options, so you lose the ability to set those yourself.

    According to the docs, JSON::dump is meant to be part of the Marshal::dump implementation scheme. The main reason you'd want to explicitly use ::dump yourself would be that you are about to stream your JSON data (over a socket for instance), since ::dump allows you to pass an IO-like object as the second argument. Unfortunately, the JSON data being produced is not really streamed as it is produced; it is created en masse and only sent once the JSON is fully created. This makes having an IO argument useful only in trivial cases.

    The final difference between the two is that ::dump can also take a limit argument that causes it to raise an ArgumentError when a certain nesting depth is exceeded.

    Comparison to #to_json

    #to_json accepts options as arguments, so JSON::generate(foo, opts) and foo.to_json(opts) are mostly equivalent*.

    *The internal implementation differs, which can result in surprising effects (see comments).

    JSON::load vs JSON::parse

    Similar to ::dump calling ::generate internally, ::load calls ::parse internally. ::load, like ::dump, may also take an IO object, but again, the source is read all at once, so streaming is limited to trivial cases. However, unlike the ::dump/::generate duality, both ::load and ::parse accept options as part of their argument signatures.

    ::load can also be passed a proc, which will be called on every Ruby object parsed from the data; it also comes with a warning that ::load should only be used with trusted data. ::parse has no such restriction, and therefore JSON::parse is the correct choice for parsing untrusted data sources like user inputs and files or streams with unknown contents.