pythonnumpyfortranbinary-data

Reading binary data file in python for analysis


I use Fortran to write data to a binary file in the following format

open(unit=99,form='unformatted',status='unknown')
do i=1,N
write(99) (i),(A(i)),(B(i))
enddo
close(99)

Here, A and B are double precision arrays. How can this binary data file be read in python?

PS: So far, I tried the following steps to read data.

with open('fort.99', 'rb') as binfile:
    data=binfile.read()

Though the file reading is succesful, I am unable to use the numpy functions on this data, which is my preferred choice for most of the analyses.


Solution

  • Updated Answer

    I played around with this some more in Numpy and you can read the file much more cleanly like this - the information below still applies and explains how it works:

    import numpy as np
    
    # Read file and reshape as "records" of 28 bytes each
    n = np.fromfile('fort.99',dtype=np.uint8).reshape(-1,28)
    I = n[:,4:8].copy().view(np.int32)     # pick bytes 4-8, make contiguous and view as integer
    A = n[:,8:16].copy().view(np.float64)  # pick bytes 8-16, make contiguous and view as float64
    B = n[:,16:24].copy().view(np.float64) # pick bytes 16-24, make contiguous and view as float64
    

    Original Answer

    I changed your program to have some recognisable data:

          program thing
          REAL*8, DIMENSION(128) :: A,B
          INTEGER N
          N=128
    
          open(unit=99,form='unformatted',status='unknown')
          do i=1,N
          A(i)=100.0 * i
          B(i)=-1000.0 * i
          write(99) (2*i),(A(i)),(B(i))
          enddo
          close(99)
          end program
    

    Then I looked at the file size and it was 3,584 bytes, so I divided that by 128 to get the bytes per fortran WRITE as 28.

    So, inspected the data with xxd as follows:

    xxd -c 28 fort.99
    
    00000000: 1400 0000 0200 0000 0000 0000 0000 5940 0000 0000 0040 8fc0 1400 0000  ..............Y@.....@......
    0000001c: 1400 0000 0400 0000 0000 0000 0000 6940 0000 0000 0040 9fc0 1400 0000  ..............i@.....@......
    00000038: 1400 0000 0600 0000 0000 0000 00c0 7240 0000 0000 0070 a7c0 1400 0000  ..............r@.....p......
    00000054: 1400 0000 0800 0000 0000 0000 0000 7940 0000 0000 0040 afc0 1400 0000  ..............y@.....@......
    00000070: 1400 0000 0a00 0000 0000 0000 0040 7f40 0000 0000 0088 b3c0 1400 0000  .............@.@............
    0000008c: 1400 0000 0c00 0000 0000 0000 00c0 8240 0000 0000 0070 b7c0 1400 0000  ...............@.....p......
    000000a8: 1400 0000 0e00 0000 0000 0000 00e0 8540 0000 0000 0058 bbc0 1400 0000  ...............@.....X......
    000000c4: 1400 0000 1000 0000 0000 0000 0000 8940 0000 0000 0040 bfc0 1400 0000  ...............@.....@......
    000000e0: 1400 0000 1200 0000 0000 0000 0020 8c40 0000 0000 0094 c1c0 1400 0000  ............. .@............
    000000fc: 1400 0000 1400 0000 0000 0000 0040 8f40 0000 0000 0088 c3c0 1400 0000  .............@.@............
    

    So there are 28 bytes per fortran WRITE and there are 4 bytes of indexing at the start and end of each record.

    Then I decoded them like this:

    #!/usr/bin/env python3
    
    import struct
    
    # Important to open the file in binary mode, 'b'
    with open('fort.99','rb') as f:
         # You should really read until error in case there are more or fewer than 128 records one day - but you know there are 128
    
        for i in range(128):
            # Read one record, i.e. output of one fortran WRITE
            record = f.read(28)
            # Unpack 2 INTEGERs, 2 DOUBLEs and an INTEGER
            _, I, A, B, _ = struct.unpack('2i2di',record)
            print(I,A,B)
    

    Output

    2 100.0 -1000.0
    4 200.0 -2000.0
    6 300.0 -3000.0
    8 400.0 -4000.0
    10 500.0 -5000.0
    12 600.0 -6000.0
    14 700.0 -7000.0
    16 800.0 -8000.0
    18 900.0 -9000.0
    20 1000.0 -10000.0
    22 1100.0 -11000.0
    24 1200.0 -12000.0
    26 1300.0 -13000.0
    28 1400.0 -14000.0
    30 1500.0 -15000.0
    32 1600.0 -16000.0
    34 1700.0 -17000.0
    36 1800.0 -18000.0
    38 1900.0 -19000.0
    40 2000.0 -20000.0
    42 2100.0 -21000.0
    44 2200.0 -22000.0
    46 2300.0 -23000.0
    48 2400.0 -24000.0
    50 2500.0 -25000.0
    52 2600.0 -26000.0
    54 2700.0 -27000.0
    56 2800.0 -28000.0
    58 2900.0 -29000.0
    60 3000.0 -30000.0
    62 3100.0 -31000.0
    64 3200.0 -32000.0
    66 3300.0 -33000.0
    68 3400.0 -34000.0
    70 3500.0 -35000.0
    72 3600.0 -36000.0
    74 3700.0 -37000.0
    76 3800.0 -38000.0
    78 3900.0 -39000.0
    80 4000.0 -40000.0
    82 4100.0 -41000.0
    84 4200.0 -42000.0
    86 4300.0 -43000.0
    88 4400.0 -44000.0
    90 4500.0 -45000.0
    92 4600.0 -46000.0
    94 4700.0 -47000.0
    96 4800.0 -48000.0
    98 4900.0 -49000.0
    100 5000.0 -50000.0
    102 5100.0 -51000.0
    104 5200.0 -52000.0
    106 5300.0 -53000.0
    108 5400.0 -54000.0
    110 5500.0 -55000.0
    112 5600.0 -56000.0
    114 5700.0 -57000.0
    116 5800.0 -58000.0
    118 5900.0 -59000.0
    120 6000.0 -60000.0
    122 6100.0 -61000.0
    124 6200.0 -62000.0
    126 6300.0 -63000.0
    128 6400.0 -64000.0
    130 6500.0 -65000.0
    132 6600.0 -66000.0
    134 6700.0 -67000.0
    136 6800.0 -68000.0
    138 6900.0 -69000.0
    140 7000.0 -70000.0
    142 7100.0 -71000.0
    144 7200.0 -72000.0
    146 7300.0 -73000.0
    148 7400.0 -74000.0
    150 7500.0 -75000.0
    152 7600.0 -76000.0
    154 7700.0 -77000.0
    156 7800.0 -78000.0
    158 7900.0 -79000.0
    160 8000.0 -80000.0
    162 8100.0 -81000.0
    164 8200.0 -82000.0
    166 8300.0 -83000.0
    168 8400.0 -84000.0
    170 8500.0 -85000.0
    172 8600.0 -86000.0
    174 8700.0 -87000.0
    176 8800.0 -88000.0
    178 8900.0 -89000.0
    180 9000.0 -90000.0
    182 9100.0 -91000.0
    184 9200.0 -92000.0
    186 9300.0 -93000.0
    188 9400.0 -94000.0
    190 9500.0 -95000.0
    192 9600.0 -96000.0
    194 9700.0 -97000.0
    196 9800.0 -98000.0
    198 9900.0 -99000.0
    200 10000.0 -100000.0
    202 10100.0 -101000.0
    204 10200.0 -102000.0
    206 10300.0 -103000.0
    208 10400.0 -104000.0
    210 10500.0 -105000.0
    212 10600.0 -106000.0
    214 10700.0 -107000.0
    216 10800.0 -108000.0
    218 10900.0 -109000.0
    220 11000.0 -110000.0
    222 11100.0 -111000.0
    224 11200.0 -112000.0
    226 11300.0 -113000.0
    228 11400.0 -114000.0
    230 11500.0 -115000.0
    232 11600.0 -116000.0
    234 11700.0 -117000.0
    236 11800.0 -118000.0
    238 11900.0 -119000.0
    240 12000.0 -120000.0
    242 12100.0 -121000.0
    244 12200.0 -122000.0
    246 12300.0 -123000.0
    248 12400.0 -124000.0
    250 12500.0 -125000.0
    252 12600.0 -126000.0
    254 12700.0 -127000.0
    256 12800.0 -128000.0
    

    As you can see there are record markers 1400 0000 on each record, meaning the records are 20 bytes long (1 fortran INTEGER@4, 2 fortran DOUBLE@8) with 4 bytes of record marker at each end - which I read into a variable called _ and discarded in Python. If you don't want that, you need to use DIRECT /STREAM output in fortran - but I don't know if you can control that at the other end of your processing chain. See here for a description.

    Keywords: fortran, Python, binary, unformatted, xxd, dump, records, view as integer, view as float, view as float64, view as double