pythonpandasdatetimetime-seriespandas-timeindex

Manipulate the Dataframe to start from the nearest varying Midnight timestamp


My goal:

I have a dataset that gets generated every day at random hours leading to the first row to start at a random time. I want to make this dataset start from the nearest midnight date. For example, if the date on the first row is 2022-05-09 15:00:00, I would have to slice the data to make it start from the nearest midnight, in this case: 2022-05-10 00:00:00

Here's what the dataset looks like:

data

What I have tried:

I had the idea of locating the index of the first occurrence of my desired timestamp and applying iloc to create the desired data set.

match_timestamp = "00:00:00"
[df[df.index.strftime("%H:%M:%S") == match_timestamp].first_valid_index()]

results: [Timestamp('2022-05-10 00:00:00')]

However, this would only result in extracting the timestamp where it first appears, and I would not be able to apply iloc to the row value. As of now, I'm stuck and can't think of a more elegant solution, which I'm sure exists.

I would be grateful if you could recommend a better method to this. Thank you in advance!

Here's the complete code to extract the df:

pip install ccxt

import pandas as pd
import ccxt

exchange = ccxt.okx({'options': {'defaultType': 'futures', 'enableRateLimit': True}})
markets = exchange.load_markets()

url = 'https://www.okex.com'
tickers = pd.DataFrame((requests.get(url+'/api/v5/market/tickers?instType=FUTURES').json())['data'])
tickers = tickers.drop('instType', axis=1)
futures_tickers = list(tickers['instId'])

symbol = 'LINK-USD-220930'
candlestick_chart= exchange.fetch_ohlcv(symbol, '1h', limit=500)
candlestick_df = pd.DataFrame(candlestick_chart)
candlestick_df.columns = ['date', 'open', 'high', 'low', 'close', 'volume']

candlestick_df['date'] = pd.to_datetime(candlestick_df['date'], unit='ms')
candlestick_df['date'] = candlestick_df['date'] + pd.Timedelta(hours=8)

df  = candlestick_df
df

The dictionary format: (as suggested)

{'open': {Timestamp('2022-05-09 15:00:00'): 9.742, Timestamp('2022-05-09 16:00:00'): 9.731, Timestamp('2022-05-09 17:00:00'): 9.743, Timestamp('2022-05-09 18:00:00'): 9.684, Timestamp('2022-05-09 19:00:00'): 9.206, Timestamp('2022-05-09 20:00:00'): 9.43, Timestamp('2022-05-09 21:00:00'): 9.316, Timestamp('2022-05-09 22:00:00'): 9.403, Timestamp('2022-05-09 23:00:00'): 9.215, Timestamp('2022-05-10 00:00:00'): 9.141}, 'high': {Timestamp('2022-05-09 15:00:00'): 9.835, Timestamp('2022-05-09 16:00:00'): 9.75, Timestamp('2022-05-09 17:00:00'): 9.788, Timestamp('2022-05-09 18:00:00'): 9.697, Timestamp('2022-05-09 19:00:00'): 9.465, Timestamp('2022-05-09 20:00:00'): 9.469, Timestamp('2022-05-09 21:00:00'): 9.515, Timestamp('2022-05-09 22:00:00'): 9.413, Timestamp('2022-05-09 23:00:00'): 9.308, Timestamp('2022-05-10 00:00:00'): 9.223}, 'low': {Timestamp('2022-05-09 15:00:00'): 9.699, Timestamp('2022-05-09 16:00:00'): 9.596, Timestamp('2022-05-09 17:00:00'): 9.674, Timestamp('2022-05-09 18:00:00'): 8.739, Timestamp('2022-05-09 19:00:00'): 9.11, Timestamp('2022-05-09 20:00:00'): 9.3, Timestamp('2022-05-09 21:00:00'): 9.208, Timestamp('2022-05-09 22:00:00'): 9.174, Timestamp('2022-05-09 23:00:00'): 9.035, Timestamp('2022-05-10 00:00:00'): 8.724}, 'close': {Timestamp('2022-05-09 15:00:00'): 9.725, Timestamp('2022-05-09 16:00:00'): 9.745, Timestamp('2022-05-09 17:00:00'): 9.682, Timestamp('2022-05-09 18:00:00'): 9.18, Timestamp('2022-05-09 19:00:00'): 9.426, Timestamp('2022-05-09 20:00:00'): 9.32, Timestamp('2022-05-09 21:00:00'): 9.397, Timestamp('2022-05-09 22:00:00'): 9.229, Timestamp('2022-05-09 23:00:00'): 9.152, Timestamp('2022-05-10 00:00:00'): 8.82}, 'volume': {Timestamp('2022-05-09 15:00:00'): 3663.0, Timestamp('2022-05-09 16:00:00'): 6603.0, Timestamp('2022-05-09 17:00:00'): 2855.0, Timestamp('2022-05-09 18:00:00'): 20084.0, Timestamp('2022-05-09 19:00:00'): 8972.0, Timestamp('2022-05-09 20:00:00'): 5551.0, Timestamp('2022-05-09 21:00:00'): 8218.0, Timestamp('2022-05-09 22:00:00'): 7651.0, Timestamp('2022-05-09 23:00:00'): 6935.0, Timestamp('2022-05-10 00:00:00'): 10409.0}}

Solution

  • my minimalistic approach of the pandaNewstarter you simply can apply it to your candlestick_df:

    import pandas as pd
    import datetime
    df = pd.read_csv("data.csv")
    df.dtypes
    # convert date column to dtype timestamp
    df.date = pd.to_datetime(df.date) 
    
    # get min value from date colum
    min_date = df.date.min()
    
    # from min get next day midnight timestamp value
    NextDay_Date = (min_date + datetime.timedelta(days=1)).replace(hour=0, minute=0, second=0, microsecond=0)
    NextDay_Date
    
    # create new DateFrame by slicing original
    df2 = df[df.date >= NextDay_Date].copy()
    

    out:

    print(NextDay_Date)
    2022-05-10 00:00:00
    print(df2)
                       date   open   high    low  close   volume
    9   2022-05-10 00:00:00  9.141  9.223  8.724  8.820  10409.0
    10  2022-05-10 01:00:00  8.755  8.979  8.558  8.832  11522.0
    11  2022-05-10 02:00:00  8.815  8.880  8.304  8.593  20969.0
    12  2022-05-10 03:00:00  8.618  8.720  8.370  8.610  15794.0
    13  2022-05-10 04:00:00  8.610  8.929  8.610  8.736   9410.0
    ..                  ...    ...    ...    ...    ...      ...