I have a table organization as below where temperature measurements are all stored in one column and there are 14 different temperature samples at different depths per timestamp.
select * from water_temp order by unixtimestamp desc limit 28;
id | unixtimestamp | depthname | depth | temperature |
---|---|---|---|---|
481042727209 | 1685770037 | 600ft | 600 | 38.5910000 |
481042727208 | 1685770037 | 500ft | 500 | 38.6960000 |
481042727207 | 1685770037 | 400ft | 400 | 38.6570000 |
481042727206 | 1685770037 | 300ft | 300 | 38.9040000 |
481042727205 | 1685770037 | 200ft | 200 | 39.3400000 |
481042727204 | 1685770037 | 150ft | 150 | 39.5030000 |
481042727203 | 1685770037 | 100ft | 100 | 40.2140000 |
481042727202 | 1685770037 | 80ft | 80 | 41.3850000 |
481042727201 | 1685770037 | 60ft | 60 | 44.1530000 |
481042727200 | 1685770037 | 50ft | 50 | 46.2590000 |
481042727199 | 1685770037 | 40ft | 40 | 47.2420000 |
481042727198 | 1685770037 | 30ft | 30 | 49.7390000 |
481042727197 | 1685770037 | 20ft | 20 | 55.3010000 |
481042727196 | 1685770037 | 10ft | 10 | 61.8840000 |
481041534602 | 1685769380 | 600ft | 600 | 38.5910000 |
481041534601 | 1685769380 | 500ft | 500 | 38.6400000 |
481041534600 | 1685769380 | 400ft | 400 | 38.7130000 |
481041534599 | 1685769380 | 300ft | 300 | 38.9040000 |
481041534598 | 1685769380 | 200ft | 200 | 39.2840000 |
481041534597 | 1685769380 | 150ft | 150 | 39.5600000 |
481041534596 | 1685769380 | 100ft | 100 | 40.2700000 |
481041534595 | 1685769380 | 80ft | 80 | 41.4410000 |
481041534594 | 1685769380 | 60ft | 60 | 44.3220000 |
481041534593 | 1685769380 | 50ft | 50 | 46.3150000 |
481041534592 | 1685769380 | 40ft | 40 | 47.3540000 |
481041534591 | 1685769380 | 30ft | 30 | 49.9070000 |
481041534590 | 1685769380 | 20ft | 20 | 56.1450000 |
481041534589 | 1685769380 | 10ft | 10 | 62.5030000 |
Hoping to get output in this format:
unixtimestamp | depth10 | depth20 | depth30 | ... |
---|---|---|---|---|
1685769380 | 62.5 | 56.14 | 49.91 | ... |
1685770037 | 61.8 | 55.3 | 49.74 | ... |
... | ... | ... | ... | ... |
I have inherited this table format. I have tried using distinct but can't find a way to have a distinct timestamp and depth. Thinking I may have to dump the data and use python to read and reorganize in a one row per timestamp.
Is there a straightforward SQL query to get the data to list in one row per timestamp with each depth temperature as columns? I want to output the data in csv format where each row is one timestamp.
Also is there some reason to organize data in a DB in the manner? I am tying to understand. Seems to me the more intuitive/useful way would be to have one column per temperature depth and one row per timestamp.
Thanks in advance.
Is there some reason to organize data in a DB in the manner? I am tying to understand. Seems to me the more intuitive/useful way would be to have one column per temperature depth and one row per timestamp.
You organize data in a way so that your queries are faster or to make data more consumable for your purposes. If having pivoted data make them more consumable for your needs, you should go for it.
Just to give you an example, if you had to get depth values when the temperature went higher than 40, attempting to do it with the output you're requiring, would likely force you to go through unpivoting, while it would be straight simple if you used your original table.
Conversely, if you need to do some kind of reporting, and observe only temperatures for specific depths (say depth 10 and depth 100), using the input table would require you a filtering, while using the output table can give you that instantly.
What you're looking for is called "pivot", and allows you to change rows into columns. Since in the output there are 14 columns dedicated to depth (from the data I assume they're always 14 for each timestamp), you should expect to put 14 computed fields inside the SELECT
statement, like:
SELECT unixtimestamp,
... AS depth10,
... AS depth20,
... AS depth30,
... AS depth40,
...
... AS depth600
FROM water_temp
Then you realize that you actually need to select your temperatures, but only when the depth corresponds to your selected depth. Something you can do with a CASE
expression (or an IF
, if we want):
SELECT unixtimestamp,
CASE WHEN depth = 10 THEN temperature END AS depth10,
CASE WHEN depth = 20 THEN temperature END AS depth20,
CASE WHEN depth = 30 THEN temperature END AS depth30,
CASE WHEN depth = 40 THEN temperature END AS depth40,
...
CASE WHEN depth = 600 THEN temperature END AS depth600
FROM water_temp
And when you attempt to do that, you realize you will have a kind of squared matrix, for which unique timestamp has now 14 records, each of which with one non-null value (on the corresponding depth column).
Since you wanted one record only, for each of your timestamps, one of the best options for reducing the size of tables is aggregation (as long as it "aggregates" data). You need to aggregate by grouping on your timestamps (because you want one record for each timestamp), and using an aggregate function that can compress one non-null value with other null values, and that's either MIN
or MAX
. Your code then becomes:
SELECT unixtimestamp,
MAX(CASE WHEN depth = 10 THEN temperature END) AS depth10,
MAX(CASE WHEN depth = 20 THEN temperature END) AS depth20,
MAX(CASE WHEN depth = 30 THEN temperature END) AS depth30,
MAX(CASE WHEN depth = 40 THEN temperature END) AS depth40,
...
MAX(CASE WHEN depth = 600 THEN temperature END) AS depth600
FROM water_temp
GROUP BY unixtimestamp
which gives you exactly what you were looking for.