pythoncsv

Get length of CSV to show progress


I am working with a large number of CSV files, each of which contain a large amount of rows. My goal is to take the data line by line and write it to a database using Python. However because there is a large amount of data I would like tot keep track of how much data has been written. For this I have counted the amount of files being queued and keep on adding one every time a file is complete.

I would like to do something similar for the CSV files and show what row I am on, and how many rows there are in total (for example: Currently on row 1 of X). I can easily get he current row by starting at one and then doing something like: currentRow += 1, however I am unsure how to get the total with out going though the time consuming process of reading line.

Additionally because my CSV files are all stored in zip archives I am currently reading them using the ZipFile module like this:

#The Zip archive and the csv files share the same name
with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
    lines = (line.decode('ascii') for line in csvFile)
    currentRow = 1

    for row in csv.reader(lines):
        print(row)
        currentRow += 1

Any ideas on how I can quickly get a total row count of a CSV file?


Solution

  • If you just want to show some progress, you could try using tqdm.

    from tqdm import tqdm
    
    with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
        lines = [line.decode('ascii') for line in csvFile]
        currentRow = 1
    
        for row in tqdm(csv.reader(lines), total=len(lines)):
            print(row)
            currentRow += 1
    

    This should give you a sleek progress bar with virtually no effort on your part.