pythonparsing

How can a Python program load and read specific lines from a file?


I have a huge file of numbers in binary format, and only certain parts of it needs to be parsed into an array. I looked into numpy.fromfile and open, but they don't have the option to read from location A to location B in the file. Can this be done?


Solution

  • If you're dealing with "huge files", I would not simply read-ignore everything up until the point where you actually need the data.

    Instead: file objects in Python have a .seek() method which you can use to jump right where you need to start parsing the data efficiently bypassing everything before.

    with open('huge_file.dat', 'rb') as f:
        f.seek(1024 * 1024 * 1024)  # skip 1GB
        ...
    

    See also: http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects