python

Iterating over a BufferedReader gives unexpected results


Numerous propositions for counting the number of lines in a file can be found here

One of the suggestions is (effectively):

with open("foo.txt", "rb") as handle:
    line_count = sum(1 for _ in handle)

When I looked at that I thought "That can't be right" but it does indeed produce the correct result.

Here's what I don't understand... The file is opened in binary mode. Therefore, I would expect iterating over handle (which is an _io.BufferedReader) to reveal one byte at a time.

It seems odd to me that a file opened in binary mode could be considered as line-oriented.

I must be missing something fundamental here.


Solution

  • io.BufferedIOBase inherits from io.IOBase, where it's documented that:

    IOBase (and its subclasses) supports the iterator protocol, meaning that an IOBase object can be iterated over yielding the lines in a stream. Lines are defined slightly differently depending on whether the stream is a binary stream (yielding bytes), or a text stream (yielding character strings)

    So apparently it has been a design choice to always return lines when iterating over a file object, the only difference being the type that the iterator returns.