
How to count the numbers of an rpt file in python without reading the document extensively?

I have quite a bunch of data; More prcisely, a 8 GB rpt file;

Now before processing it I want to know how many rows there actually are - this helps me to later find out how long the processing will take etc; Now reading an rpt file of that size in python as a whole obviously does not work so I need to read line by line; To find out the number of lines I wrote that simple python script:

import pandas as pd


for line in pd.read_fwf("test.rpt", chunksize=1):

This seems to work well - however I realized that it is quite slow and to really read all the lines is unnecessary;

Is there a way to get the number of rows without reading each line?

Many thanks


  • I'm not familiar with the .rpt file format, but if it can be read in as a text file (which I'm assuming it can if you're using pd.read_fwf) then you can probably just use Python's builtins for input/output.

    with open('test.rpt', 'r') as testfile:
        for i, line in enumerate(testfile):
        # Add one to get the line count

    This will allow you to (efficiently) iterate over each line of the file object. The builtin enumerate function will count each line as you read it.