pythonfileiterator

What is the idiomatic way to iterate over a binary file?


With a text file, I can write this:

with open(path, 'r') as file:
    for line in file:
        # handle the line

This is equivalent to this:

with open(path, 'r') as file:
    for line in iter(file.readline, ''):
        # handle the line

This idiom is documented in PEP 234 but I have failed to locate a similar idiom for binary files.

With a binary file, I can write this:

with open(path, 'rb') as file:
    while True:
        chunk = file.read(1024 * 64)
        if not chunk:
            break
        # handle the chunk

I have tried the same idiom that with a text file:

def make_read(file, size):
    def read():
        return file.read(size)
    return read

with open(path, 'rb') as file:
    for chunk in iter(make_read(file, 1024 * 64), b''):
        # handle the chunk

Is it the idiomatic way to iterate over a binary file in Python?


Solution

  • I don't know of any built-in way to do this, but a wrapper function is easy enough to write:

    def read_in_chunks(infile, chunk_size=1024*64):
        while True:
            chunk = infile.read(chunk_size)
            if chunk:
                yield chunk
            else:
                # The chunk was empty, which means we're at the end
                # of the file
                return
    

    Then at the interactive prompt:

    >>> from chunks import read_in_chunks
    >>> infile = open('quicklisp.lisp')
    >>> for chunk in read_in_chunks(infile):
    ...     print chunk
    ... 
    <contents of quicklisp.lisp in chunks>
    

    Of course, you can easily adapt this to use a with block:

    with open('quicklisp.lisp') as infile:
        for chunk in read_in_chunks(infile):
            print chunk
    

    And you can eliminate the if statement like this.

    def read_in_chunks(infile, chunk_size=1024*64):
        chunk = infile.read(chunk_size)
        while chunk:
            yield chunk
            chunk = infile.read(chunk_size)