jsonfirebasefirebase-toolsjsonstream

Firebase Tools: How to avoid "out of memory" errors when setting database with a big JSON file? How to stream JSON via firebase-tools?


We have a GitHub action that copies data from one Firebase project to another using firebase-tools (we are using the latest version, 9.11.0) package:

firebase use fromProject && firebase database:get / -o export.json
firebase use toProject && firebase database:set -y / export.json

This has worked fine until our data has grown bigger and now we are getting the following error:

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

As a temporary fix, we’ve been able too apply node --max-old-space-size flag which simply increases memory available to node process:

node --max-old-space-size=4096 /home/runner/work/foo/foo/node_modules/firebase database:set -y / export.json

Considering our data will keep growing, we’d like to implement a proper fix, which in my understanding would be to set data by streaming the JSON. However, I’m not sure firebase-tools allow that. Searching through Github issues didn't yield anything useful.

Perhaps apart from streaming there is another useful approach in splitting a huge JSON file into chunks before setting them?

Thanks!


Solution

  • We have used the HTTP streaming with Firebase HTTP API in order to overcome the troubles of saving, modifying locally and uploading a HUGE (over 256mb) JSON file.

    We built a function to HTTP pipe data stream from one project’s database to another:

    async function copyFirebasePath(path, from, to) {
      // getAccessToken from https://firebase.google.com/docs/database/rest/auth
      const fromAccessToken = await getAccessToken(from.key)
      const toAccessToken = await getAccessToken(to.key)
    
      return new Promise((resolve, reject) => {
        let toRequest
        // create write request, but don’t start writing to it
        // we’ll pipe the read request as the data to write
        try {
          toRequest = https.request(
            `${to.databaseUrl}/${path}.json?print=silent`,
            {
              method: 'PUT',
              headers: {
                Authorization: `Bearer ${toAccessToken}`
              }
            },
            (/* res */) => {
              resolve()
            }
          )
        } catch (writeError) {
          reject(writeError)
        }
    
        try {
          https
            .request(
              `${from.databaseUrl}/${path}.json`,
              {
                method: 'GET',
                headers: {
                  Authorization: `Bearer ${fromAccessToken}`
                }
              },
              res => {
                res.pipe(toRequest)
                res.on('end', () => {
                  toRequest.end()
                })
              }
            )
            .end()
        } catch (readError) {
          reject(readError)
        }
      })
    }
    

    And we use it like this:

    // get Object.keys from remote db
    const shallowDB = await request({
      method: 'get',
      // note ?shallow=true here – prevents loading the whole db!
      url: `${from.databaseUrl}/.json?shallow=true`,
      options: {
        headers: {
          Authorization: `Bearer ${fromAccessToken}`
        }
      }
    })
    const dbKeys = Object.keys(shallowDB)
    const keysToOmit = ['foo', 'bar', 'baz']
    
    try {
      await Promise.all(
        dbKeys
          .filter(dbKey => !keysToOmit.includes(dbKey))
          .map(key => copyFirebasePath(key, from, to))
      )
    } catch (copyError) {
      console.log(copyError)
      throw new Error(copyError)
    }