sqlmysqlmysql-8.0

How to find data sets for same time range across multiple products


I have a database where I store a 1-minute snapshot for various exchange's product's best bid and ask. I have a requirement where I need to find:

  1. 500 best bids and ask across 3 products (product1 & exch1, product2 & exch2,product3 & exch3)
  2. All the 500 datapoints should fall within certain time-range e.g. 0900-1014, 1030-1129
  3. All 3 products should have same timestamp, i.e., if prod1 doesn't have 1030 datapoint, the query should ignore prod2 and prod3 <-- This is where I am stuck

I am using MySQL 8.0.35 on Ubuntu 20.04

My table has the following structure

symbol(varchar)
exchange(varchar)
bid_price(double)
ask_price(double)
timestamp(bigint) # actual best bid ask received time from exchange (millisecond precision)
updated_at(bigint) # minutes precision stored as seconds since epoch e.g. 1702954500102
symbol,exchange,bid_price,ask_price,timestamp,updated_at
prod1,exch1,555.7,555.8,1702956540603,1702956540
prod1,exch1,555.8,555.9,1702956478591,1702956480
prod1,exch1,555.9,556,1702956420102,1702956420
prod1,exch1,556,556.1,1702956360610,1702956360
prod1,exch1,556,556.1,1702956299607,1702956300
prod1,exch1,556.1,556.2,1702956239595,1702956240
prod1,exch1,556.2,556.3,1702956179597,1702956180
prod1,exch1,556,556.1,1702956122111,1702956120
prod1,exch1,556,556.1,1702956061081,1702956060
prod1,exch1,555.8,555.9,1702955998590,1702956000
prod2,exch2,7.1041,7.1042,1702956600942,1702956600
prod2,exch2,7.104,7.1042,1702956541012,1702956540
prod2,exch2,7.1043,7.1045,1702956480465,1702956480
prod2,exch2,7.1043,7.1044,1702956420617,1702956420
prod2,exch2,7.1043,7.1044,1702956360264,1702956360
prod2,exch2,7.1037,7.1038,1702956300269,1702956300
prod2,exch2,7.1039,7.1041,1702956239092,1702956240
prod2,exch2,7.1041,7.1043,1702956180001,1702956180
prod2,exch2,7.1043,7.1045,1702956123891,1702956120
prod2,exch2,7.1045,7.1047,1702956060588,1702956060
prod3,exch3,72.78,72.79,1702956600332,1702956600
prod3,exch3,72.79,72.8,1702956540698,1702956540
prod3,exch3,72.8,72.81,1702956480542,1702956480
prod3,exch3,72.79,72.81,1702956420228,1702956420
prod3,exch3,72.8,72.81,1702956352133,1702956360
prod3,exch3,72.81,72.82,1702956287733,1702956300
prod3,exch3,72.81,72.82,1702956239441,1702956240
prod3,exch3,72.82,72.83,1702956179253,1702956180
prod3,exch3,72.81,72.82,1702956124140,1702956120
prod3,exch3,72.81,72.82,1702956058999,1702956060

With following query I am able to do requirement #1 and #2, but #3 is something I am unable to figure out yet

(SELECT *, FROM_UNIXTIME(updated_at, '%H%i') AS u_time
FROM algo_system.bbo_1min
WHERE
(
(symbol = 'prod1' AND exchange = 'exch1')
OR
(symbol = 'prod2' AND exchange = 'exch2')
OR
(symbol = 'prod3' AND exchange = 'exch3')
)
HAVING
(u_time >= 900 AND u_time <= 1014)
OR
(u_time >= 1030 AND u_time <= 1129)
ORDER BY TIMESTAMP DESC LIMIT 500)

Edit: From the image below, as can see the highlighted red section has prod2 and prod3 data for 21:00 but prod1 doesn't have, I want to remove the prod2 and prod3 data if there is no datapoint for prod1 at 21:00 enter image description here


Solution

  • A sub join or a where exists query may work. Do consider performance also -- I have no idea if the sub query will slow your work down too much.

    Apologies for not answering in MySQL.

    Here is an example in MsSQL using an inner join -- you can see where I mildly adapted to account for the difference between MsSQL and MySQL. You should be able to work with the idea though.

    SELECT 
        symbol,
        exchange,
        bid_price,
        ask_price,
        --timestamp,
        FORMAT(dateadd(S, convert(bigint, timestamp) / 1000, '1970-01-01'),'hhmm')+700 AS timestamp,
        --updated_at
        FORMAT(dateadd(S, convert(INT, updated_at), '1970-01-01'),'hhmm')+700 AS updated_at
    into #tmp_db
    FROM(
        values 
        ('prod1','exch1',7.1041,7.1042, '1702956600942',    '1702956600'), -- 1030 for testing
        ('prod1','exch1',555.7,555.8,   '1702956540603',    '1702956540'),
        ('prod1','exch1',555.8,555.9,   '1702956478591',    '1702956480'),
        ('prod1','exch1',555.9,556,     '1702956420102',    '1702956420'),
        ('prod1','exch1',556,556.1,     '1702956360610',    '1702956360'),
        ('prod1','exch1',556,556.1,     '1702956299607',    '1702956300'),
        ('prod1','exch1',556.1,556.2,   '1702956239595',    '1702956240'),
        ('prod1','exch1',556.2,556.3,   '1702956179597',    '1702956180'),
        ('prod1','exch1',556,556.1,     '1702956122111',    '1702956120'),
        ('prod1','exch1',556,556.1,     '1702956061081',    '1702956060'),
        ('prod1','exch1',555.8,555.9,   '1702955998590',    '1702956000'),
        ('prod2','exch2',7.1041,7.1042, '1702956600942',    '1702956600'), -- 1030 for testing
        ('prod2','exch2',7.104,7.1042,  '1702956541012',    '1702956540'),
        ('prod2','exch2',7.1043,7.1045, '1702956480465',    '1702956480'),
        ('prod2','exch2',7.1043,7.1044, '1702956420617',    '1702956420'),
        ('prod2','exch2',7.1043,7.1044, '1702956360264',    '1702956360'),
        ('prod2','exch2',7.1037,7.1038, '1702956300269',    '1702956300'),
        ('prod2','exch2',7.1039,7.1041, '1702956239092',    '1702956240'),
        ('prod2','exch2',7.1041,7.1043, '1702956180001',    '1702956180'),
        ('prod2','exch2',7.1043,7.1045, '1702956123891',    '1702956120'),
        ('prod2','exch2',7.1045,7.1047, '1702956060588',    '1702956060'),
        ('prod3','exch3',72.78,72.79,   '1702956600332',    '1702956600'), -- 1030 for testing
        ('prod3','exch3',72.79,72.8,    '1702956540698',    '1702956540'),
        ('prod3','exch3',72.8,72.81,    '1702956480542',    '1702956480'),
        ('prod3','exch3',72.79,72.81,   '1702956420228',    '1702956420'),
        ('prod3','exch3',72.8,72.81,    '1702956352133',    '1702956360'),
        ('prod3','exch3',72.81,72.82,   '1702956287733',    '1702956300'),
        ('prod3','exch3',72.81,72.82,   '1702956239441',    '1702956240'),
        ('prod3','exch3',72.82,72.83,   '1702956179253',    '1702956180'),
        ('prod3','exch3',72.81,72.82,   '1702956124140',    '1702956120'),
        ('prod3','exch3',72.81,72.82,   '1702956058999',    '1702956060')
    
    ) AS x (symbol, exchange, bid_price, ask_price, timestamp, updated_at)
    
    
    
    SELECT top 500
        *
        --FROM_UNIXTIME(updated_at, '%H%i') AS u_time
    FROM #tmp_db
    JOIN (
            SELECT
                count(DISTINCT symbol) as product_count,
                updated_at as sub_updated_at
            FROM #tmp_db
            WHERE 
                (
                    (symbol = 'prod1' AND exchange = 'exch1')
                    OR
                    (symbol = 'prod2' AND exchange = 'exch2')
                    OR
                    (symbol = 'prod3' AND exchange = 'exch3')
                )
            GROUP BY updated_at
        ) as x on x.product_count = 3 and x.sub_updated_at = updated_at
    WHERE
        (
            (symbol = 'prod1' AND exchange = 'exch1')
            OR
            (symbol = 'prod2' AND exchange = 'exch2')
            OR
            (symbol = 'prod3' AND exchange = 'exch3')
        )
    --HAVING
    --  (updated_at >= 900 AND updated_at <= 1014)
    --  OR
    --  (updated_at >= 1030 AND updated_at <= 1129)
    and (
            (updated_at >= 900 AND updated_at <= 1014)
            OR
            (updated_at >= 1030 AND updated_at <= 1129)
        )
    ORDER BY updated_at DESC --LIMIT 500
    
    
    
    drop table #tmp_db
    

    all three if the prod1 is included enter image description here

    none if the prod1 is not showing enter image description here

    Updated: added "DISTINCT symbol" to ensure all three products are counted, in case a prod can show more than once.