pythonnumpynotepad++hexdump

numpy.array.tofile() binary file looks "strange" in notepad++


I am just wondering how the function actually stores the data. Because to me, it looks completely strange. Say I have the following code:

import numpy as np
filename = "test.dat"
print(filename)
fileobj = open(filename, mode='wb')
off = np.array([1, 300], dtype=np.int32)
off.tofile(fileobj)
fileobj.close()

fileobj2 = open(filename, mode='rb')
off = np.fromfile(fileobj2, dtype = np.int32)
print(off)
fileobj2.close()

Now I expect 8 bytes inside the file, where each element is represented by 4 bytes (and I could live with any endianness). However when I open up the file in a hex editor (used notepad++ with hex editor plugin) I get the following bytes:

01 00 C4 AC 00

5 bytes, and I have no idea at all what it represents. The first byte looks like it is the number, but then what follows is something weird, certainly not "300".

Yet reloading shows the original array.

Is this something I don't understand in python, or is it a problem in notepad++? - I notice the hex looks different if I select a different "encoding" (huh?). Also: Windows does report it being 8 bytes long.


Solution

  • You can tell very easily that the file actually does have 8 bytes, the same 8 bytes you'd expect (01 00 00 00 2C 01 00 00) just by using anything other than Notepad++ to look at the file, including just replacing your off = np.fromfile(fileobj2, dtype=np.int32) with off = fileobj2.read() then printing the bytes (which will give you b'\x01\x00\x00\x00,\x01\x00\x00'1)).

    And, from your comments, after I suggested that, you tried it, and saw exactly that.

    Which means this is either a bug in Notepad++, or a problem with the way you're using it; Python, NumPy, and your own code are perfectly fine.


    1) In case it isn't clear: '\x2c' and ',' are the same character, and bytes uses the printable ASCII representation for printable ASCII characters, as well as familiar escapes like '\n', when possible, only using the hex backslash escape for other values.