I have TableA and Table B as shown below. Currently, as you see in my query below, TableA joins TableB based on box_id and with all rows in TableB whose timestamp is within 1 minute of TableA timestamp.
I would like to modify this query such that instead of selecting TableB rows that are within 1 minute of TableA row, I would like to select TableB row that either has an equivalent timestamp or whose timestamp is closest+less than the timestamp in TableA.
Below is an example of TableA and TableB along with the Desired Table I would like to have. Once that is achieved, I would not be grouping by rainfall, hail, weather as it would not be necessary. Any idea/help would be highly appreciated.
TableA:
id box_id timestamp
373001645 1 2020-05-07 06:00:20
373001695 1 2020-05-07 06:02:26
373001762 1 2020-05-07 06:05:17
373001794 1 2020-05-07 06:06:38
373001810 2 2020-05-07 06:07:21
TableB:
id box_id timestamp data
373001345 1 2020-05-07 06:00:20 {"R": 0.114, "H": 20.808}
373001395 1 2020-05-07 06:02:26 {"R": 0.12, "H": 15.544}
373001462 1 2020-05-07 06:03:01 {"R": 0.006, "H": 55.469}
373001494 1 2020-05-07 06:04:38 {"R": 0.004, "H": 51.85}
373001496 1 2020-05-07 06:05:18 {"R": 0.02, "H": 5.8965}
373001497 1 2020-05-07 06:06:39 {"R": 0.12, "H": 54.32}
373001510 2 2020-05-07 06:07:09 {"R": 0.34, "H": 1.32}
373001511 2 2020-05-07 06:07:29 {"R": 0.56, "H": 32.7}
Desired Table:
id timestamp rainfall hail weather
373001345 2020-05-07 06:00:20 0.114 20.808 {"Rainfall": 0.114, "Hail": 20.808}
373001395 2020-05-07 06:02:26 0.12 15.544 {"Rainfall": 0.12, "Hail": 15.544}
373001462 2020-05-07 06:05:17 0.004 51.85 {"Rainfall": 0.004, "Hail": 51.85}
373001494 2020-05-07 06:06:38 0.02 5.8965 {"Rainfall": 0.02, "Hail": 5.8965}
373001496 2020-05-07 06:07:21 0.34 1.32 {"Rainfall": 0.34, "Hail": 1.32}
Query:
SELECT tableA.id, tableA.timestamp AS timestamp,
(tableB.data->'$.R') as rainfall, (tableB.data->'$.H') as hail,
JSON_OBJECT("Hail", (tableB.data->'$.H'),"Rainfall", (tableB.data->'$.R')) AS weather
FROM tableA tableA
LEFT JOIN tableB tableB ON tableA.box_id = tableB.box_id
AND TIMESTAMPDIFF(MINUTE, tableB.timestamp, tableA.timestamp) BETWEEN -1 AND 1
WHERE
tableA.timestamp BETWEEN '2020-05-07 00:00:00' AND '2020-05-07 23:59:59'
GROUP BY tableA.id,rainfall,hail,weather
ORDER BY tableA.timestamp ASC;
You can use a correlated subquery in the ON condition to find the maximum timestamp in tableB which is less than or equal to the current rows timestamp from tableA, and JOIN based on that:
SELECT A.id,
A.timestamp AS timestamp,
B.data->'$.R' as rainfall,
B.data->'$.H' as hail,
JSON_OBJECT("Hail", B.data->'$.H',
"Rainfall", B.data->'$.R'
) AS weather
FROM tableA A
LEFT JOIN tableB B ON B.box_id = A.box_id
AND B.timestamp = (SELECT MAX(timestamp)
FROM tableB b2
WHERE b2.box_id = A.box_id
AND b2.timestamp <= A.timestamp)
ORDER BY A.timestamp
Output:
id timestamp rainfall hail weather
373001645 2020-05-07 06:00:20 0.114 20.808 {"Hail": 20.808, "Rainfall": 0.114}
373001695 2020-05-07 06:02:26 0.12 15.544 {"Hail": 15.544, "Rainfall": 0.12}
373001762 2020-05-07 06:05:17 0.004 51.85 {"Hail": 51.85, "Rainfall": 0.004}
373001794 2020-05-07 06:06:38 0.02 5.8965 {"Hail": 5.8965, "Rainfall": 0.02}
373001810 2020-05-07 06:07:21 0.34 1.32 {"Hail": 1.32, "Rainfall": 0.34}