I came across this question and i was able to solve it but inefficiently. My solution is a naive o(m*n) where m is the number of queries and n is the number of orders.
You are given a log (std:vector) of orders sent from our trading system over a day. Each order has the following properties:
- order_token: unique integer identifying the order
- shares: number of shares to buy or sell
- price: price to buy or sell each share at
- side: false = sell, true = buy
- created at: timestamp when the order was created
- cancelled_or_executed_at: timestamp when the order was cancelled or executed (filled)
An order is live in the interval [created_at, cancelled_or_executed_at). That is, created at is inclusive and cancelled_or_executed_at is exclusive. Timestamps are represented as integers, e.g. milliseconds since midnight. You may assume each order is cancelled or executed in full.
In addition to orders, you are also given a vector of queries. Each query has one field: query_time, a timestamp. The answer to the query is the total number of shares outstanding among all orders which were live at the query time. Outstanding shares are aggregated regardless of time, e.g. open orders to Buy 10 and Sell 10 aggregate to 20 shares live.
I was wondering if anyone had a better way to optimize my solution below with any data structure or algorithm. i'm sure there is. It's a c++ question but I did my solution in python for my convenience
def calculate_outstanding_shares(orders, queries):
result = {}
for query in queries:
live_trades = 0
for order in orders:
if query > order[4] and query < order[5]:
live_trades += order[1]
result[query] = live_trades
return result
# Example usage
orders = [
[3, 15, 200, True, 2000, 4000],
[1,10,100,True,0,5000],
[4, 25, 250, False, 2500, 6000],
[2,20,150,False,1000,3000],
]
queries = [
500, # Before any order
1500, # Inside the duration of the first buy order
2500, # Inside the duration of both buy orders and the start of the second sell order
3500, # Inside the duration of the second sell order and after the first sell order ends
5500 # After all orders have ended
]
result = calculate_outstanding_shares(orders, queries)
print(result)
You can pre-process the orders
list and on each time-point (starting/ending) calculate actual amount of outstanding shares.
Then for each query use bisect
module to search for correct time position. That way you only do m
* log n
queries.
from bisect import bisect
def calculate(orders, queries):
outstanding_shares = []
for a, b, c, d, e, f in orders:
outstanding_shares.append((e, b))
outstanding_shares.append((f, -b))
outstanding_shares.sort()
c = 0
outstanding_shares = [(t, (c := c + amt)) for t, amt in outstanding_shares]
return {
q: outstanding_shares[bisect(outstanding_shares, (q,)) - 1][1] for q in queries
}
# Example usage
orders = [
[3, 15, 200, True, 2000, 4000],
[1, 10, 100, True, 0, 5000],
[4, 25, 250, False, 2500, 6000],
[2, 20, 150, False, 1000, 3000],
]
queries = [
500, # Before any order
1500, # Inside the duration of the first buy order
2500, # Inside the duration of both buy orders and the start of the second sell order
3500, # Inside the duration of the second sell order and after the first sell order ends
5500, # After all orders have ended
]
print(calculate(orders, queries))
Prints:
{500: 10, 1500: 30, 2500: 45, 3500: 50, 5500: 25}