pythonbinarylarge-files

Python: slicing a very large binary file


Say I have a binary file of 12GB and I want to slice 8GB out of the middle of it. I know the position indices I want to cut between.

How do I do this? Obviously 12GB won't fit into memory, that's fine, but 8GB won't either... Which I thought was fine, but it appears binary doesn't seem to like it if you do it in chunks! I was appending 10MB at a time to a new binary file and there are discontinuities on the edges of each 10MB chunk in the new file.

Is there a Pythonic way of doing this easily?


Solution

  • Here's a quick example. Adapt as needed:

    def copypart(src, dest, start, length, bufsize=1024*1024):
        with open(src, 'rb') as f1:
            f1.seek(start)
            with open(dest, 'wb') as f2:
                while length:
                    chunk = min(bufsize, length)
                    data = f1.read(chunk)
                    f2.write(data)
                    length -= chunk
    
    if __name__ == '__main__':
        GIG = 2**30
        copypart('test.bin', 'test2.bin', 1 * GIG, 8 * GIG)