pythonpandasquantitative-financedolphindb

How to calculate rolling time difference between first and last timestamp in a sliding window


I’m working with a trade dataset containing timestamps, quantities, and prices. I need to calculate the time difference between first and last trade in each window.

Here’s my Python implementation using pandas:

import pandas as pd

data = {
    'timestamp': pd.to_datetime(['09:34:07', '09:36:42', '09:36:51', '09:36:59', '09:32:47', '09:35:26', '09:34:16', '09:34:26', '09:38:12'], format='%H:%M:%S'),
    'sym': ['C', 'MS', 'MS', 'MS', 'IBM', 'IBM', 'C', 'C', 'C'],
    'qty': [2200, 1900, 2100, 3200, 6800, 5400, 1300, 2500, 8800],
    'price': [49.6, 29.46, 29.52, 30.02, 174.97, 175.23, 50.76, 50.32, 51.29]
}

df = pd.DataFrame(data)
window_size = 2

   
# Calculate time differences
def calculate_time_delta(group, window_size):
    tDelta = [pd.Timedelta(0)] * len(group)
    for i in range(window_size - 1, len(group)):
        tDelta[i] = group['timestamp'].iloc[i] - group['timestamp'].iloc[i - window_size + 1]
    return pd.Series(tDelta, index=group.index)

df['tDelta'] = df.groupby('sym').apply(lambda group: calculate_time_delta(group, window_size)).reset_index(level=0, drop=True) 

print(df)
#output is :
            timestamp  sym   qty   price          tDelta
0 1900-01-01 09:34:07    C  2200   49.60 0 days 00:00:00
1 1900-01-01 09:36:42   MS  1900   29.46 0 days 00:00:00
2 1900-01-01 09:36:51   MS  2100   29.52 0 days 00:00:09
3 1900-01-01 09:36:59   MS  3200   30.02 0 days 00:00:08
4 1900-01-01 09:32:47  IBM  6800  174.97 0 days 00:00:00
5 1900-01-01 09:35:26  IBM  5400  175.23 0 days 00:02:39
6 1900-01-01 09:34:16    C  1300   50.76 0 days 00:00:09
7 1900-01-01 09:34:26    C  2500   50.32 0 days 00:00:10
8 1900-01-01 09:38:12    C  8800   51.29 0 days 00:03:46

Now I need to implement this in DolphinDB. My table is defined as:

sym = `C`MS`MS`MS`IBM`IBM`C`C`C$SYMBOL
price = 49.6 29.46 29.52 30.02 174.97 175.23 50.76 50.32 51.29
qty = 2200 1900 2100 3200 6800 5400 1300 2500 8800
timestamp = [09:34:07,09:36:42,09:36:51,09:36:59,09:32:47,09:35:26,09:34:16,09:34:26,09:38:12]
t1 = table(timestamp, sym, qty, price)

I tried the following approach in DolphinDB:

// Define a custom aggregator for time delta
defg timeDelta(windowTimestamps) {
    return last(windowTimestamps) - first(windowTimestamps)
}

// Calculate rolling tDelta with window_size=2, grouped by sym
result = select 
    timestamp, 
    sym, 
    rolling(timeDelta, timestamp, 2, 1) as tDelta 
from t1 
context by sym 

However, when I execute this code, I encounter the following error:

The length of the vector returned by the UDF specified in 
SELECT must be the same as the length of the CONTEXT BY column

I’m unsure how to fix this error. Given that my actual dataset is quite large, I’m especially interested in the most efficient way to calculate the time differences between the first and last trade in each window for each stock symbol in DolphinDB.


Solution

  • You can try using mlast funciton(Gets the last timestamp in the current window) and mfirst function(Gets the first timestamp in the current window) :

    
    windowSize = 2
    
    // Main query statement
    select 
        sym,  
        // Calculate the time difference between last and first timestamp in each window
        mlast(timestamp, windowSize) - mfirst(timestamp, windowSize) as tDelta
    from t1  
    context by sym //Group calculations by symbol column