pythonpandasdataframenumpylarge-files

panda looping large size file how to get the amount of chunks?


I'm using pandas to read a large size file,the file size is 11 GB

chunksize=100000
for df_ia in pd.read_csv(file, chunksize=n,
                         iterator=True, low_memory=False):

My question is how to get the amount of the all the chunks,now what I can do is setting a index and count one by one,but this looks not a smart way:

index = 0
chunksize=100000
for df_ia in pd.read_csv(file, chunksize=n,
                         iterator=True, low_memory=False):
    index + =1

So after looping the whole size file the final index will be the amount of all the chunks,but is there any faster way to direct get it ?


Solution

  • You can use the enumerate function like:

    for i, df_ia in enumerate(pd.read_csv(file, chunksize=5,
                                          iterator=True, low_memory=False)):
    

    Then after you finish iteration, the value of i will be len(number_of_dataframes)-1.