sqlmysqlpivotdistinct-values

Reorder data from multiplexed column values to rows


I have a table organization as below where temperature measurements are all stored in one column and there are 14 different temperature samples at different depths per timestamp.

select * from water_temp order by unixtimestamp desc limit 28;
id unixtimestamp depthname depth temperature
481042727209 1685770037 600ft 600 38.5910000
481042727208 1685770037 500ft 500 38.6960000
481042727207 1685770037 400ft 400 38.6570000
481042727206 1685770037 300ft 300 38.9040000
481042727205 1685770037 200ft 200 39.3400000
481042727204 1685770037 150ft 150 39.5030000
481042727203 1685770037 100ft 100 40.2140000
481042727202 1685770037 80ft 80 41.3850000
481042727201 1685770037 60ft 60 44.1530000
481042727200 1685770037 50ft 50 46.2590000
481042727199 1685770037 40ft 40 47.2420000
481042727198 1685770037 30ft 30 49.7390000
481042727197 1685770037 20ft 20 55.3010000
481042727196 1685770037 10ft 10 61.8840000
481041534602 1685769380 600ft 600 38.5910000
481041534601 1685769380 500ft 500 38.6400000
481041534600 1685769380 400ft 400 38.7130000
481041534599 1685769380 300ft 300 38.9040000
481041534598 1685769380 200ft 200 39.2840000
481041534597 1685769380 150ft 150 39.5600000
481041534596 1685769380 100ft 100 40.2700000
481041534595 1685769380 80ft 80 41.4410000
481041534594 1685769380 60ft 60 44.3220000
481041534593 1685769380 50ft 50 46.3150000
481041534592 1685769380 40ft 40 47.3540000
481041534591 1685769380 30ft 30 49.9070000
481041534590 1685769380 20ft 20 56.1450000
481041534589 1685769380 10ft 10 62.5030000

Hoping to get output in this format:

unixtimestamp depth10 depth20 depth30 ...
1685769380 62.5 56.14 49.91 ...
1685770037 61.8 55.3 49.74 ...
... ... ... ... ...

I have inherited this table format. I have tried using distinct but can't find a way to have a distinct timestamp and depth. Thinking I may have to dump the data and use python to read and reorganize in a one row per timestamp.

Is there a straightforward SQL query to get the data to list in one row per timestamp with each depth temperature as columns? I want to output the data in csv format where each row is one timestamp.

Also is there some reason to organize data in a DB in the manner? I am tying to understand. Seems to me the more intuitive/useful way would be to have one column per temperature depth and one row per timestamp.

Thanks in advance.


Solution

  • Is there some reason to organize data in a DB in the manner? I am tying to understand. Seems to me the more intuitive/useful way would be to have one column per temperature depth and one row per timestamp.

    You organize data in a way so that your queries are faster or to make data more consumable for your purposes. If having pivoted data make them more consumable for your needs, you should go for it.

    Just to give you an example, if you had to get depth values when the temperature went higher than 40, attempting to do it with the output you're requiring, would likely force you to go through unpivoting, while it would be straight simple if you used your original table.

    Conversely, if you need to do some kind of reporting, and observe only temperatures for specific depths (say depth 10 and depth 100), using the input table would require you a filtering, while using the output table can give you that instantly.


    What you're looking for is called "pivot", and allows you to change rows into columns. Since in the output there are 14 columns dedicated to depth (from the data I assume they're always 14 for each timestamp), you should expect to put 14 computed fields inside the SELECT statement, like:

    SELECT unixtimestamp,
           ... AS depth10,
           ... AS depth20,
           ... AS depth30,
           ... AS depth40,
           ...
           ... AS depth600
    FROM water_temp 
    

    Then you realize that you actually need to select your temperatures, but only when the depth corresponds to your selected depth. Something you can do with a CASE expression (or an IF, if we want):

    SELECT unixtimestamp,
           CASE WHEN depth =  10 THEN temperature END AS depth10,
           CASE WHEN depth =  20 THEN temperature END AS depth20,
           CASE WHEN depth =  30 THEN temperature END AS depth30,
           CASE WHEN depth =  40 THEN temperature END AS depth40,
           ...
           CASE WHEN depth = 600 THEN temperature END AS depth600
    FROM water_temp 
    

    And when you attempt to do that, you realize you will have a kind of squared matrix, for which unique timestamp has now 14 records, each of which with one non-null value (on the corresponding depth column).

    Since you wanted one record only, for each of your timestamps, one of the best options for reducing the size of tables is aggregation (as long as it "aggregates" data). You need to aggregate by grouping on your timestamps (because you want one record for each timestamp), and using an aggregate function that can compress one non-null value with other null values, and that's either MIN or MAX. Your code then becomes:

    SELECT unixtimestamp,
           MAX(CASE WHEN depth =  10 THEN temperature END) AS depth10,
           MAX(CASE WHEN depth =  20 THEN temperature END) AS depth20,
           MAX(CASE WHEN depth =  30 THEN temperature END) AS depth30,
           MAX(CASE WHEN depth =  40 THEN temperature END) AS depth40,
           ...
           MAX(CASE WHEN depth = 600 THEN temperature END) AS depth600
    FROM water_temp 
    GROUP BY unixtimestamp
    

    which gives you exactly what you were looking for.