cgriddb

Rowkey for time-series container of GridDB based on gsCurrentTime()


I have input from a wide variety of sensors who each only ever produce one or two rows of input, so creating a new container per sensor makes little sense. The data comes in an order which should not be lost, as such I've considered enumerating through the input rows as they come and assign numbers accordingly. I then wanted to give additional information on the spacing between the data input. After first adjusting the id's to no longer be sequential I instead am now considering timestamps as rowkeys, and just assigning them when writing the data into a row. I've found mentions in regards to other databases that this can cause problems, as the data now contains information which is not technically directly associated with it.
So essentially the rowkey is set by: gsSetRowFieldByTimestamp(row, 0, gsCurrentTime()); Would using the said time function to supply the rowkey for a timeseries be appropriate? Any foreseeable issues, besides the possibly obvious one that this effectively bottlenecks insertion to the resolution of gsCurrentTime()?


Solution

  • First, even if a sensor only has a few columns I believe the data schema should still be one container per device. Yes, it seems wasteful but it is the GridDB way. GridDB needs multiple containers to partition data amongst it's nodes if using clustering. Using Multi-query will negate any performance issues on the Read side of your application.

    Now, if you insist on using a singular container it is important to note your data collector must be single-threaded to avoid theoretical row key collisions and yes, use gsCurrentTime() or TimestampUtils.current in Java.