I am trying to decide how measurements should be organised in an InfluxDB the database (which I believe they call schema design and data layout) but I think this may be a more general database type question.
Let's say as a simple example that I am measuring two quantites, temperature and humidity (imaginative, I know!), in two locations, living room and outside.
InfluxDB has the syntax for inserting data points:
measurement, tag_key=tag_value field_key=field_value
and so there are two obvious (at least to me) options. Briefly, the first option would insert a data point like this:
INSERT temperature,location=outside value=15
INSERT humidity,location=outside value=50
whereas the second option would do it this way:
INSERT sensor_measurements,location=outside temperature=15,humidity=50
My questions are more high level:
My own thoughts:
Option 1 seems to me to be more like what is implied by the InfluxDB description "measurement". Both temperature and humidity are separate quantities. But it seems a little clunky to just call it "value".
Option 2 appears to have the advantage that both the humidity and the temperature share exactly the same timestamp. This would come in useful, for example, if I wanted to import the data into some other software and do a correlation between the two quantites, and would mean I wouldn't have to do any interpolation or binning to get them to match up.
I am not sure if it is a bad idea with Option 2 to just have a general measurement called sensor_measurements, and will be hard to maintain later.
In detail:
At time t1, insert the data:
INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28
At time t2, insert some different data:
INSERT humidity,location=outside value=50
INSERT temperature,location=outside value=15
INSERT humidity,location=living_room value=65
INSERT temperature,location=living_room value=28
I can then get access to the living room temperature by querying the following:
> SELECT value FROM temperature WHERE location='living_room'
name: temperature
time value
---- -----
1590416682017481091 28
1590416723963187592 29
I can also use the group by function to do something like this:
SELECT value FROM temperature GROUP BY "location"
At time t1, insert the data:
INSERT sensor_measurements,location=outside temperature=15,humidity=50
INSERT sensor_measurements,location=living_room temperature=28,humidity=65
At time t2, insert some different data:
INSERT sensor_measurements,location=outside temperature=14,humidity=56
INSERT sensor_measurements,location=living_room temperature=29,humidity=63
I can now get access to the living room temperature by querying the following:
> SELECT temperature FROM sensor_measurements WHERE location='living_room'
name: sensor_measurements
time temperature
---- -----------
1590416731530452068 28
1590416757055629103 29
I can now use the group by function to do something like this:
SELECT temperature FROM sensor_measurements GROUP BY "location"
I would use option 2 from offered options, because less records = less resources = better query response time (in theory). Generally, both approaches look good.
But I will use more generic 3rd option in real world. Single generic metrics
measurement with tags metric,location
and field value
:
INSERT metrics,metric=temperature,location=outside value=15
INSERT metrics,metric=humidity,location=living_room value=50
INSERT metrics,metric=temperature,location=living_room value=28
INSERT metrics,metric=humidity,location=living_room value=65
That gives me opportunity to create single generic Grafana dashboard, where user will have option to select visualized metric/location via dashboard variable (generated directly from the InfluxDB, e.g. SHOW TAG VALUES WITH KEY = "metric"
). Any new inserted metrics (e.g. `illuminance , pressure, wind-speed, wind-direction, ...) or location can be immediately visualized in this generic dashboard. Eventually, some metrics may have also additional tags. That is good and I will be able to use ad-hoc Grafana variable, so users will be able specify any number of key/value filters on the fly. Grafana doc.