Number of outstanding shares per query

I came across this question and i was able to solve it but inefficiently. My solution is a naive o(m*n) where m is the number of queries and n is the number of orders.

You are given a log (std:vector) of orders sent from our trading system over a day. Each order has the following properties:

order_token: unique integer identifying the order

shares: number of shares to buy or sell

price: price to buy or sell each share at

side: false = sell, true = buy

created at: timestamp when the order was created

cancelled_or_executed_at: timestamp when the order was cancelled or executed (filled)

An order is live in the interval [created_at, cancelled_or_executed_at). That is, created at is inclusive and cancelled_or_executed_at is exclusive. Timestamps are represented as integers, e.g. milliseconds since midnight. You may assume each order is cancelled or executed in full.

In addition to orders, you are also given a vector of queries. Each query has one field: query_time, a timestamp. The answer to the query is the total number of shares outstanding among all orders which were live at the query time. Outstanding shares are aggregated regardless of time, e.g. open orders to Buy 10 and Sell 10 aggregate to 20 shares live.

I was wondering if anyone had a better way to optimize my solution below with any data structure or algorithm. i'm sure there is. It's a c++ question but I did my solution in python for my convenience

def calculate_outstanding_shares(orders, queries):
    result = {}

    for query in queries:
        live_trades = 0
        for order in orders:
            if query > order[4] and query < order[5]:
                live_trades += order[1]
                
        result[query] = live_trades

    return result


# Example usage
orders = [
    [3, 15, 200, True, 2000, 4000],
    [1,10,100,True,0,5000],
    [4, 25, 250, False, 2500, 6000],
    [2,20,150,False,1000,3000],
]

queries = [
    500,  # Before any order
    1500,  # Inside the duration of the first buy order
    2500,  # Inside the duration of both buy orders and the start of the second sell order
    3500,  # Inside the duration of the second sell order and after the first sell order ends
    5500  # After all orders have ended
]

result = calculate_outstanding_shares(orders, queries)
print(result)

Solution

You can pre-process the orders list and on each time-point (starting/ending) calculate actual amount of outstanding shares.

Then for each query use bisect module to search for correct time position. That way you only do m * log n queries.

from bisect import bisect


def calculate(orders, queries):
    outstanding_shares = []

    for a, b, c, d, e, f in orders:
        outstanding_shares.append((e, b))
        outstanding_shares.append((f, -b))

    outstanding_shares.sort()

    c = 0
    outstanding_shares = [(t, (c := c + amt)) for t, amt in outstanding_shares]

    return {
        q: outstanding_shares[bisect(outstanding_shares, (q,)) - 1][1] for q in queries
    }


# Example usage
orders = [
    [3, 15, 200, True, 2000, 4000],
    [1, 10, 100, True, 0, 5000],
    [4, 25, 250, False, 2500, 6000],
    [2, 20, 150, False, 1000, 3000],
]

queries = [
    500,  # Before any order
    1500,  # Inside the duration of the first buy order
    2500,  # Inside the duration of both buy orders and the start of the second sell order
    3500,  # Inside the duration of the second sell order and after the first sell order ends
    5500,  # After all orders have ended
]

print(calculate(orders, queries))

Prints:

{500: 10, 1500: 30, 2500: 45, 3500: 50, 5500: 25}