I am requesting data from the api.etherscan.io
website. For this, I require a free API key. I am getting information for the following wallet addresses 0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5, 0xc508dbe4866528db024fb126e0eb97595668c288
. Below is the code I am using:
wallet_addresses = ['0xdafea492d9c6733ae3d56b7ed1adb60692c98bc5', '0xc508dbe4866528db024fb126e0eb97595668c288']
page_number = 0
df_main = pd.DataFrame()
while True:
for address in wallet_addresses:
url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
output = requests.get(url).text
df_temp = pd.DataFrame(json.loads(output)['result'])
df_temp['wallet_address'] = address
df_main = df_main.append(df_temp)
page_number += 1
df_main['timeStamp'] = pd.to_datetime(df_main['timeStamp'], unit='s')
if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime.date(2022, 1, 1):
pass
Note that you need your own (free) ether_api
.
What I want to do is get data from today's date, all the way back to 2022-01-01
which is what I am trying to achieve in the if statement.
However, the above gives me an error: ValueError: unit='s' not valid with non-numerical val='2022-09-19 18:14:47'
How can this be done? I've tried multiple methods to get pandas datetime to work, but all of them gave me errors.
Here you go, it's working without an error:
page_number = 0
df_main = pd.DataFrame()
while True:
for address in wallet_addresses:
url=f'https://api.etherscan.io/api?module=account&action=txlist&address={address}&startblock=0&endblock=99999999&page={page_number}&offset=10&sort=asc&apikey={ether_api}'
output = requests.get(url).text
df_temp = pd.DataFrame(json.loads(output)['result'])
df_temp['wallet_address'] = address
page_number += 1
df_temp['timeStamp'] = pd.to_datetime(df_temp['timeStamp'], unit='s')
df_main = df_main.append(df_temp)
if min(pd.to_datetime(df_main['timeStamp']).dt.date) < datetime(2022, 1, 1).date():
pass
So, what has happened here. As suggested in the first comment under question we acknowledged the type
of first record in df_main
with type(df_main['timeStamp'].iloc[0])
. With IPython and Jupyter-Notebook one can look what is happening with df_main
just after receiving an error with it being populated on the last for
loop iteration that failed.
Otherwise if one uses PyCharm or any other IDE with a possibility to debug, the contents of df_main
can be revealed via debug.
What we were missing, is that df_main = df_main.append(df_temp)
is placed in a slightly wrong place. On first iteration it works well, pd.to_datetime(df_main['timeStamp'], unit='s')
gets an str
type with Linux epoch and gets converted to pandas._libs.tslibs.timestamps.Timestamp
.
But on next iteration df_main['timeStamp']
already has the Timestamp type and it gets appended with str
type, so we get a column with mixed type. E.g.:
type(df_main['timeStamp'].iloc[0]) == type(df_main['timeStamp'].iloc[-1])
This results with False. Hence when trying to convert Timestamp to Timestamp one gets an error featured in question.
To mitigate this we can place .append()
below the conversion and do this conversion on df_temp
instead of df_main
, this way we will only append Timestamps to the resulting DataFrame and the code below with if
clause will work fine.
Another small change I've made was datetime.date(2022, 1, 1)
. This change was not needed, but the way one works with datetime
depends on how this library was imported, so it's worth mentioning:
import datetime
datetime.date(2022, 1, 1)
datetime.datetime(2022, 1, 1).date()
from datetime import datetime
datetime(2022, 1, 1).date()
All the above is legit and will produce the same. On the first import module
gets imported, on the second one type
gets imported.
Conversion to Timestamp takes time. If the API provides Linux epoch dates, why not use this date for comparison? Let's add this somewhere where you define wallet_addresses
:
reference_date = "01/01/2021"
reference_date = int(time.mktime(datetime.datetime.strptime(reference_date, "%d/%m/%Y").timetuple()))
This will result in 1609448400. Other stack overflow question as reference.
This integer can now be compared with timestamps provided by the API. The only thing left is to cast str
to int
. We can have your code left intact with some minor changes at the end:
<< Your code without changes >>
df_main['timeStamp'] = df_main['timeStamp'].astype(int)
if min(df_main['timeStamp']) < reference_date:
pass
To make a benchmark I've changed while True:
to for _ in range(0,4):
to limit the infinite cycle, results are as follows:
It's 30% faster. Casting str
to int
takes less time than conversion to TimeStamps, I would call this a preferable solution.
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
It makes sense to comply with this warning. df_main = df_main.append(df_temp)
has to be changed to df_main = pd.concat([df_main, df_temp])
.
As for current 1.5.0 version it's already deprecated. Time to upgrade!