csvdata-modelingkeen-io

Import bulk data in to Keen.io, where one column is stringified JSON


I have around 4 million rows of data that I'd like to import in to Keen.io. The non-normalized CSV structure would look like what is attached here ... the "dimensions" column is a stringified JSON object. I see Keen.io has a batch importer for CSV. How would the "dimensions" column be handled by the importer? Would it turn it in to an object? What is your recommendation on how I should approach this? Thanks!


Solution

  • First lets structure your data in the form of event data, then prepare it to load into Keen.

    The link you shared looks like you have a table of data about fish (one row per fish) and another table of data about regulations. I'm assuming that if you were simply trying to store these tables somewhere, you'd use a SQL db or something like Firebase. Since you're interested in loading them into Keen IO I'll assume that you have some sort of actions related to this data that you're interested in analyzing, and that's what you mean by 4 million rows of data.

    The first step in loading the data is determine which actions related to these events that you are interested in recording.

    For this example, let's assume you're tracking each time a fish or regulation is viewed or saved. Our event collections would be:

    Each time one of those events happen, we'll post it into Keen in a format like this. We basically create an object with various relevant properties about the fish, the regulation, the user, etc:

    // Create an event data object with the properties you want to send
    // This is an example of sending an event using the Keen IO JavaScript SDK, but you could post the event from iOS, Android, or from the backend.
    // This example is an event that happens when someone views details about a fish
    
    var event = {
      gid: "be68ea99-8f3b-9s9f",
      fish: { // details about the fish
        id: "168",
        species_name: "Amberjack, Lesser"
      },
      user: { // details about the user
        id: "029aomsldsf9aadfm"
      },
      app_type: "web", // "iOS", "Android"
      keen: {  // time the event happened
        timestamp: new Date().toISOString()
      }
    };
    
    // Send it to the "view_fish" collection
    client.addEvent("view_fish", event);

    Ideally your app is instrumented so that these events are posted to Keen IO as they happen.

    It sounds like in this case that you also have some historical data to load. Check out the Keen IO bulk loading guide. To do this, you create CSVs for each of your historical events. The first row must be column names. After that you have one row per event:

    keen.timestamp, gid, fish.id, fish.species_name
    2017-06-13T19:11:05Z, ae68ea99-2f3b-9s9z, "168", "Amberjack, Lesser"
    2017-05-13T19:11:05Z, be88ea99-8f4b-3s9x, "44", "Scallop, Bay"
    2017-04-13T19:11:05Z, ce33ea99-5f3b-9s9b, "223", "Scallop, Calico"
    

    You'll have one CSV for each of your event collections, e.g. "view_fish", "save_fish", etc.

    Then you can load each of them using the Keen CLI:

    $ keen events:add --collection view_fish --file view_fish_events.csv --csv
    $ keen events:add --collection save_fish --file save_fish_events.csv --csv
    $ keen events:add --collection view_regulation --file view_regulation_events.csv --csv
    $ keen events:add --collection save_regulation --file save_regulation_events.csv --csv