I’m working with a trade dataset containing timestamps, quantities, and prices. I need to calculate the time difference between first and last trade in each window.
Here’s my Python implementation using pandas:
import pandas as pd
data = {
'timestamp': pd.to_datetime(['09:34:07', '09:36:42', '09:36:51', '09:36:59', '09:32:47', '09:35:26', '09:34:16', '09:34:26', '09:38:12'], format='%H:%M:%S'),
'sym': ['C', 'MS', 'MS', 'MS', 'IBM', 'IBM', 'C', 'C', 'C'],
'qty': [2200, 1900, 2100, 3200, 6800, 5400, 1300, 2500, 8800],
'price': [49.6, 29.46, 29.52, 30.02, 174.97, 175.23, 50.76, 50.32, 51.29]
}
df = pd.DataFrame(data)
window_size = 2
# Calculate time differences
def calculate_time_delta(group, window_size):
tDelta = [pd.Timedelta(0)] * len(group)
for i in range(window_size - 1, len(group)):
tDelta[i] = group['timestamp'].iloc[i] - group['timestamp'].iloc[i - window_size + 1]
return pd.Series(tDelta, index=group.index)
df['tDelta'] = df.groupby('sym').apply(lambda group: calculate_time_delta(group, window_size)).reset_index(level=0, drop=True)
print(df)
#output is :
timestamp sym qty price tDelta
0 1900-01-01 09:34:07 C 2200 49.60 0 days 00:00:00
1 1900-01-01 09:36:42 MS 1900 29.46 0 days 00:00:00
2 1900-01-01 09:36:51 MS 2100 29.52 0 days 00:00:09
3 1900-01-01 09:36:59 MS 3200 30.02 0 days 00:00:08
4 1900-01-01 09:32:47 IBM 6800 174.97 0 days 00:00:00
5 1900-01-01 09:35:26 IBM 5400 175.23 0 days 00:02:39
6 1900-01-01 09:34:16 C 1300 50.76 0 days 00:00:09
7 1900-01-01 09:34:26 C 2500 50.32 0 days 00:00:10
8 1900-01-01 09:38:12 C 8800 51.29 0 days 00:03:46
Now I need to implement this in DolphinDB. My table is defined as:
sym = `C`MS`MS`MS`IBM`IBM`C`C`C$SYMBOL
price = 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
t1 = table(timestamp, sym, qty, price)
I tried the following approach in DolphinDB:
// Define a custom aggregator for time delta
defg timeDelta(windowTimestamps) {
return last(windowTimestamps) - first(windowTimestamps)
}
// Calculate rolling tDelta with window_size=2, grouped by sym
result = select
timestamp,
sym,
rolling(timeDelta, timestamp, 2, 1) as tDelta
from t1
context by sym
However, when I execute this code, I encounter the following error:
The length of the vector returned by the UDF specified in
SELECT must be the same as the length of the CONTEXT BY column
I’m unsure how to fix this error. Given that my actual dataset is quite large, I’m especially interested in the most efficient way to calculate the time differences between the first and last trade in each window for each stock symbol in DolphinDB.
You can try using mlast funciton(Gets the last timestamp in the current window) and mfirst function(Gets the first timestamp in the current window) :
windowSize = 2
// Main query statement
select
sym,
// Calculate the time difference between last and first timestamp in each window
mlast(timestamp, windowSize) - mfirst(timestamp, windowSize) as tDelta
from t1
context by sym //Group calculations by symbol column