pythonpandasdataframedatetimeetherscan

Python: converting timestamp to date time not working


I am requesting data from the api.etherscan.io website. For this, I require a free API key. I am getting information for the following wallet addresses 0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5, 0xc508dbe4866528db024fb126e0eb97595668c288. Below is the code I am using:

wallet_addresses = ['0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5', '0xc508dbe4866528db024fb126e0eb97595668c288']

page_number = 0
df_main = pd.DataFrame()
while True:
    for address in wallet_addresses:
        url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
        output = requests.get(url).text
        df_temp = pd.DataFrame(json.loads(output)['result'])
        df_temp['wallet_address'] = address
        df_main = df_main.append(df_temp)
        page_number += 1
        df_main['timeStamp'] = pd.to_datetime(df_main['timeStamp'], unit='s')
        if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime.date(2022, 1, 1):
            pass

Note that you need your own (free) ether_api.

What I want to do is get data from today's date, all the way back to 2022-01-01 which is what I am trying to achieve in the if statement.

However, the above gives me an error: ValueError: unit='s' not valid with non-numerical val='2022-09-19 18:14:47'

How can this be done? I've tried multiple methods to get pandas datetime to work, but all of them gave me errors.


Solution

  • Here you go, it's working without an error:

    page_number = 0
    df_main = pd.DataFrame()
    while True:
        for address in wallet_addresses:
            url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
            output = requests.get(url).text
            df_temp = pd.DataFrame(json.loads(output)['result'])
            df_temp['wallet_address'] = address
            page_number += 1
            df_temp['timeStamp'] = pd.to_datetime(df_temp['timeStamp'], unit='s')
            df_main = df_main.append(df_temp)
            if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime(2022, 1, 1).date():
                pass
    

    Wrong append

    So, what has happened here. As suggested in the first comment under question we acknowledged the type of first record in df_main with type(df_main['timeStamp'].iloc[0]). With IPython and Jupyter-Notebook one can look what is happening with df_main just after receiving an error with it being populated on the last for loop iteration that failed.
    Otherwise if one uses PyCharm or any other IDE with a possibility to debug, the contents of df_main can be revealed via debug.

    What we were missing, is that df_main = df_main.append(df_temp) is placed in a slightly wrong place. On first iteration it works well, pd.to_datetime(df_main['timeStamp'], unit='s') gets an str type with Linux epoch and gets converted to pandas._libs.tslibs.timestamps.Timestamp.

    But on next iteration df_main['timeStamp'] already has the Timestamp type and it gets appended with str type, so we get a column with mixed type. E.g.:

    type(df_main['timeStamp'].iloc[0]) == type(df_main['timeStamp'].iloc[-1])
    

    This results with False. Hence when trying to convert Timestamp to Timestamp one gets an error featured in question.

    To mitigate this we can place .append() below the conversion and do this conversion on df_temp instead of df_main, this way we will only append Timestamps to the resulting DataFrame and the code below with if clause will work fine.

    As a side note

    Another small change I've made was datetime.date(2022, 1, 1). This change was not needed, but the way one works with datetime depends on how this library was imported, so it's worth mentioning:

    import datetime
    datetime.date(2022, 1, 1)
    datetime.datetime(2022, 1, 1).date()
    
    from datetime import datetime
    datetime(2022, 1, 1).date()
    

    All the above is legit and will produce the same. On the first import module gets imported, on the second one type gets imported.


    Alternative solution

    Conversion to Timestamp takes time. If the API provides Linux epoch dates, why not use this date for comparison? Let's add this somewhere where you define wallet_addresses:

    reference_date = "01/01/2021"
    reference_date = int(time.mktime(datetime.datetime.strptime(reference_date, "%d/%m/%Y").timetuple()))
    

    This will result in 1609448400. Other stack overflow question as reference.

    This integer can now be compared with timestamps provided by the API. The only thing left is to cast str to int. We can have your code left intact with some minor changes at the end:

    << Your code without changes >>
            df_main['timeStamp'] = df_main['timeStamp'].astype(int)
            if min(df_main['timeStamp']) < reference_date:
                pass
    

    To make a benchmark I've changed while True: to for _ in range(0,4): to limit the infinite cycle, results are as follows:

    It's 30% faster. Casting str to int takes less time than conversion to TimeStamps, I would call this a preferable solution.

    Future warning

    FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
    

    It makes sense to comply with this warning. df_main = df_main.append(df_temp) has to be changed to df_main = pd.concat([df_main, df_temp]).

    As for current 1.5.0 version it's already deprecated. Time to upgrade!