I am working with a large number of CSV files, each of which contain a large amount of rows. My goal is to take the data line by line and write it to a database using Python. However because there is a large amount of data I would like tot keep track of how much data has been written. For this I have counted the amount of files being queued and keep on adding one every time a file is complete.
I would like to do something similar for the CSV files and show what row I am on, and how many rows there are in total (for example: Currently on row 1 of X
). I can easily get he current row by starting at one and then doing something like: currentRow += 1
, however I am unsure how to get the total with out going though the time consuming process of reading line.
Additionally because my CSV files are all stored in zip archives I am currently reading them using the ZipFile module like this:
#The Zip archive and the csv files share the same name
with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
lines = (line.decode('ascii') for line in csvFile)
currentRow = 1
for row in csv.reader(lines):
print(row)
currentRow += 1
Any ideas on how I can quickly get a total row count of a CSV file?
If you just want to show some progress, you could try using tqdm.
from tqdm import tqdm
with zipArchive.open(fileName[:-4] + '.csv', 'r') as csvFile:
lines = [line.decode('ascii') for line in csvFile]
currentRow = 1
for row in tqdm(csv.reader(lines), total=len(lines)):
print(row)
currentRow += 1
This should give you a sleek progress bar with virtually no effort on your part.