I have two tables with order information, and am looking for a more efficient way to find whether or not each order in one table has crossing potential with the other.
Example tables:
t1:([]date: 2025.04.08 2025.04.08 2025.04.08 ;RIC:`a`b`c;Side:`buy`sell`buy;StartQty:100 1000 400; LimitPx: 100.00 20.00 15.00 ;AckTime:09:31 09:40 09:40; EndTime:09:33 09:45 09:41);
t2:([]date: 2025.04.08 2025.04.08 2025.04.08 ;RIC:`a`b`c;Side:`sell`buy`buy;StartQty:90 1000 400; LimitPx: 100.00 20.00 15.00 ;AckTime:09:30 09:40 09:41; EndTime:09:32 09:45 09:42);
I currently have a function to iterate through each entry in t1 and query t2, and check if the side is opposite, and the time range overlaps, and if so then output the crossed size. Expected result for these table would be:
ric a can cross 90 shares, ric b can cross all 1000, and ric c cannot cross.
And my current function is:
func:{[row]
t:select from t2 where date=(row[`date][0]), RIC=(row[`RIC][0]), not Side=(row[`Side][0]), ((AckTime within ((row[`AckTime][0]);(row[`EndTime][0])))|(EndTime within ((row[`AckTime][0]);(row[`EndTime][0]))));
crossedNotional:?[(count t)>0;(min((exec max StartQty from t);row[`StartQty][0]))*row[`LimitPx][0];0];
contra:select from t where StartQty=max StartQty;
res:select date, RIC, AckTime, EndTime, StartQty, crossedNotional:crossedNotional from row;
res:lj[res;`date xkey select date, contraSide:Side, contraAckTime:AckTime, contraEndTime:EndTime, contraStartQty:StartQty from contra];
:res
};
func2:{[table;x] func[enlist table[x]]};
test: ((,/) func2[t1;] peach (0+til (count t1)));
This works well for small sizes of t1 and/or t2, but the time complexity of the function is quite poor to expand to larger data sets. Is there a more efficient way to do this? Could an aj on AckTime achieve the same result; can you add filters to the aj on RIC/date/Side/etc.? Is there a way to add partition on date / RIC for tables loaded into memory? Anything else that looks inefficient with this?
Thank you!
EDIT: this is an expansion of my earlier question here: Join 2 tables based on row-specific time frames (crossing order information) KDB+/Q
In my opinion your t1 and t2 tables aren't in the right shape to be made efficient. Having two separate columns for start time and end time rules out the use of kdbs most efficient operations (asof join, window join etc).
One possibility is to reshape your t2 to make it more amenable to asof join. Here's an idea to get the ball rolling (note - I've stripped this example down to one symbol and side, the exercise is left to the reader to expand to multiple symbols and sides via good use of "by sym,side"):
q)t2:([]ric:`a;size:100*9 2 10 6 5;startTime:09:30 09:32 09:33 09:40 09:45; endTime:09:35 09:34 09:38 09:43 09:50);
q)prep1:ungroup select time:asc distinct raze(startTime,'endTime)by ric from t2;
q)prep2:2!select ric,time:startTime,size,endTime,i1:i from t2;
q)prep3:2!select ric,time:endTime,rmv:neg size,i2:i from t2;
q)show preppedt2:update avail:sums(0^size)+0^rmv,t2idx:{(x,y)except 0N,z}\[();i1;i2] from (lj/)(prep1;prep2;prep3)
ric time size endTime i1 rmv i2 avail t2idx
-------------------------------------------------
a 09:30 900 09:35 0 900 ,0
a 09:32 200 09:34 1 1100 0 1
a 09:33 1000 09:38 2 2100 0 1 2
a 09:34 -200 1 1900 0 2
a 09:35 -900 0 1000 ,2
a 09:38 -1000 2 0 `long$()
a 09:40 600 09:43 3 600 ,3
a 09:43 -600 3 0 `long$()
a 09:45 500 09:50 4 500 ,4
a 09:50 -500 4 0 `long$()
Now with this t2 in much better shape you can asof join
q)t1:([]ric:`a;time:09:32 09:34 09:38 09:35;qty:4?3000);
q)aj[`ric`time;t1;select ric,time,avail,t2idx from preppedt2]
ric time qty avail t2idx
-----------------------------
a 09:32 2956 1100 0 1
a 09:34 1384 1900 0 2
a 09:38 1281 0 `long$()
a 09:35 1882 1000 ,2
Disclaimer I have not tested this on large data. In theory it should be faster than iterating through every row of a table, since it's using mostly left joins, aj and whole-table operations (albeit with some sums/scans).
Also, some attributes (e.g. `g# on ric) would improve performance of the final aj