pythoncsv

Count the number of rows for each csv, faster code


I have a folder with several csv files. When I open them in notepad I can see the number of lines/rows. This works. However, this is a manual check. I wanted to automate this with the following code:

import os
import csv

with open("number of lines check.txt", "w") as a:
    for path, subdirs, files in os.walk(r'C:\Desktop\folder'):
        for filename in files:
            with open(os.path.join(path, filename), "r", encoding ="utf-8") as f:
                reader = csv.reader(f, delimiter ="\t")
                data = list(reader)
                row_count = len(data)
                f = os.path.join(path, filename)
                a.write(str(f)+" "+str(row_count) + os.linesep)

This works, it gives me a file with the filename and row counts. However, my problem here is this code for some reasons takes a very long time to run. I am not sure why. I assume, it is because it has to read in each csv? When I open the files in notepad this works quite fast and the number of rows is displayed without any delay. So I was not sure if my code is not good or if there is a faster implementation?


Solution

  • It doesn't look like you're using the data for anything other than counting the rows. There's no need for a CSV reader for that.

    Files are iterable by line so you can loop over it and count iterations. The writing out also has overhead so writing once will be faster.

    row_counts = {}
    
    for path, subdirs, files in os.walk(r'C:\Desktop\folder'):
        for filename in files:
            with open(os.path.join(path, filename), "r", encoding ="utf-8") as f:
                rows = len(list(f))
                row_counts[filename] = rows
    
    with open("number of lines check.txt", "w") as a:
        for f, count in row_counts.items():
            a.write(f"{f} {count}\n")
    

    If the files are very large, it would be better to iterate over them and maintain a count so that you don't need to read the whole file at once.

    for filename in files:
        with open(os.path.join(path, filename), "r", encoding ="utf-8") as f:
            rows = 0
            for _ in f: rows += 1
            row_counts[filename] = rows