pythonmemoryvisualizationlarge-filesmemory-dump

Reading and processing large volatile memory file byte by byte


I'm currently using python to work on memory dumps created by apps like belkasoft, ftk imager and so on, they are usually of the extension .mem or .dmp, and of the form:

53 FF 00 F0 53 FF 00 

I'd like to visualize these data let's say use a heatmap or use curves maybe trying to visualize each byte. It would be 2147483648 bytes in case of 2GB File. How would you approach reading and processing this kind of large files >= 2GB

I've been experimenting with something like:

with open("File.mem",'rb') as file:
    byte = file.read(1)
    while byte:
         Do something

and managed to do some calculation but its painfully slow I also tried reading the file line by line which was fast but again reading the bytes of each line and doing some conversion was also painfully slow. I've read also read some about numpy loadtxt, but didn't experiment much with it thought i'd ask here first.

Any ideas that might suit this scenario and make it more efficient?

Thanks a lot


Solution

  • The usual approach to read big files is to use mmap. The file contents are mapped to your process memory space and you can access it by reading data from RAM. OS takes care to load needed data to the RAM. This works similar to how swap file works. OS knows that data is in a file and loads it dynamically when you access it. OS also can unload data from RAM if it need the memory for other purposes because it can always load it again from a file.

    Take a look at mmap python module.