python-3.xnumpyastropypyfits

Writing a new FITS file from old data with astropy


I have a very simple operation to perform on a FITS file (data is numpy array format) but I cannot get astropy to either save it as a new file or overwrite the existing one.

I'm re-writing some old code that used the numpy pyfits module to work with astronomical FITS files - I want to update this to use the astropy.io fits module. Specifically, some data that I work with is 3D and some is 4D. The 4D stuff is just a convention - the 4th axis contains no useful information (an example of the data can be found here : http://www.mpia.de/THINGS/Data_files/NGC_628_NA_CUBE_THINGS.FITS). So I prefer to remove the extra axis and then the rest of my code can carry on without any special requirements.

Here's the old pyfits-based code I used that works perfectly :

import numpy
import pyfits

filename = 'NGC628.fits'
outfile  = 'NGC628_reshaped.fits'

# Get the shape of the file
fitsfile=pyfits.open(filename)
image = fitsfile[0].data
header =fitsfile[0].header
z = image.shape[1]  # No. channels                  
y = image.shape[2]  # No. x pixels
x = image.shape[3]  # No. y pixels

newimage = numpy.reshape(image,[z,y,x])

pyfits.core.writeto(outfile,newimage,header, clobber=True)

Nothing complicated there, it just reshapes an array and saves it to a new file. Marvellous. Now I want to replace this with the astropy fits module. If I do :

import numpy
from astropy.io import fits as pyfits

fitsfile = pyfits.open('NGC628.fits', mode='update')
image  = fitsfile[0].data
header = fitsfile[0].header
z = image.shape[1]                  
y = image.shape[2]
x = image.shape[3]
image = numpy.reshape(image,[z,y,x])

... then so far, so good. The "image" array is reshaped correctly, as image.shape confirms. But I cannot for the life of me figure out how to save this to a new (or the old) FITS file. Using the old syntax :

fitsfile.writeto('NGC628_2.fits',image,header)

... gives an error message, "AttributeError: 'numpy.ndarray' object has no attribute 'lower'. If instead, based on the astropy documentation, I simply omit the image and header and try and save to the original file :

fitsfile.writeto('NGC628.fits')

Then I get an error that the file already exists. If I provide the keyword "overwrite=True", then it complains about "WinError 32 : The process cannot access the file because it is being used by another process : NGCC628.fits". The file is definitely NOT open in any other programs.

If I specify the new filename NGC628_2.fits, then Python crashes (sends me back to the command prompt) with no errors. A very small file is written that contains only the header data and none of the image data. EDIT : exactly the same thing happens if I use the correct command to write a new FITS file using image and header data, e.g. pyfits.writeto('NGC628_2.fits',image,header).

Just to make things even more confusing, if I do a slightly simpler operation, say, setting all the image data to a constant value and then closing the file :

import numpy
from astropy.io import fits as pyfits
fitsfile = pyfits.open('NGC628.fits', mode='update')
image  = fitsfile[0].data
header = fitsfile[0].header
image[:,:,:,:] = 5.0
fitsfile.close()

Then this works - the original file is now an array where every value is equal to 5. I gather from the astropy documentation that simply opening the file in update mode and closing it should be enough, and in this case it is. But this same trick does not work when reshaping the image - the FITS file is unaltered.

So what the heck am I doing wrong ? Either updating the original or saving to a new file would be fine (preferably the latter) but I cannot get either operation to work correctly.

EDIT : I have Python version 3.5.3, numpy version 1.17.3, astropy version 3.2.3, and I'm running Windows 10.


Solution

  • Okay, I think you made one small mistake early on that you kept simply overlooking, and that sent you down a wild goose chase to find a different solution. Your other attempts that didn't work also would have likely worked except for a small, subtle thing as well. I'll go over what went wrong in the first place, and then explain what went wrong in some of the subsequent cases, so this is a bit long...

    First of all, in your original code you had:

    pyfits.core.writeto(outfile,newimage,header, clobber=True)
    

    This could be written more simply:

    pyfits.writeto(outfile, newimage, header, clobber=True)
    

    The pyfits.core module is an implementation detail, and there's almost no reason to reference it directly. You can just call this function directly as pyfits.writeto().

    If you had that, then your existing code would have worked exactly as before by changing the import to

    from astropy.io import fits as pyfits
    

    The only caveat being that the clobber argument is renamed overwrite, though I think clobber would have still worked, with a deprecation warning.

    First mistake

    What you changed it to instead, and seem to have overlooked, was:

    fitsfile.writeto('NGC628_2.fits',image,header)
    

    This is not the same thing. In the first case it was the high-level writeto() "convenience function". But now you're calling fitsfile.writeto. Here fitsfile is not the astropy.io.fits module, but is the file object itself:

    >>> type(fitsfile)
    <class 'astropy.io.fits.hdu.hdulist.HDUList'>
    

    The HDUList class also has method HDUList.writeto which performs a similar function, but it takes different arguments. You could check this in a number of ways, like:

    >>> help(fitsfile.writeto)
    Help on method writeto in module astropy.io.fits.hdu.hdulist:
    
    writeto(fileobj, output_verify='exception', overwrite=False, checksum=False) method of astropy.io.fits.hdu.hdulist.HDUList instance
        Write the `HDUList` to a new file.
    

    Its only required argument is the filename to write the file out to. Unlike the other writeto it does not accept data or header arguments, because HDUList is already a collection of HDUs which already have associated data and headers. In fact, if you just want to make some changes to an existing FITS file and write those changes out to a new file, I would not use the high-level writeto() like you did originally--it's mostly useful if you are creating an array from scratch and just want to write it quickly out to a new FITS file. So your original code could also be written like this:

    import numpy as np
    import astropy.io.fits as pyfits
    
    filename = 'NGC628.fits'
    outfile  = 'NGC628_reshaped.fits'
    
    # Get the shape of the file
    fitsfile = pyfits.open(filename)
    image = fitsfile[0].data
    z = image.shape[1]  # No. channels                  
    y = image.shape[2]  # No. x pixels
    x = image.shape[3]  # No. y pixels
    
    # Replace the primary HDU's data in-place; this just manipulates the
    # in-memory HDU data structure; it does not change anything on disk
    fitsfile[0].data = np.reshape(image, [z,y,x])
    
    # Write the new HDU structure to outfile
    fitsfile.writeto(outfile, overwrite=True)
    

    Update mode

    Next you tried modifying the file by opening it with mode='update', but this isn't really how update mode is intended to be used. mode='update' is more similar to opening a plain text file in write-mode like open('foo.txt', 'w'): It is intended for modifying an existing file in-place on disk. Any modifications you make are flushed out to disk when the file is closed, and there is no need to use writeto() or anything like that.

    You wrote:

    Using the old syntax :

    fitsfile.writeto('NGC628_2.fits',image,header)

    But as I wrote previously, there's no "old syntax" here, you were just trying to use the HDUList.writeto() method instead of the writeto() function.

    If I provide the keyword "overwrite=True", then it complains about "WinError 32 : The process cannot access the file because it is being used by another process : NGCC628.fits". The file is definitely NOT open in any other programs.

    I see that you're on Windows--this is a particular limitation of Windows in general: Windows does not allow writing to, moving, or deleting a file if there are already any open handles to that file. I think the error message here is misleading (this message comes straight from Windows itself): "it is being used by another process". It should be something like "it is being used by this process or another process". When you did:

    fitsfile = pyfits.open('NGC628.fits', mode='update')
    

    you now had an open handle to the file, so Windows won't let you overwrite that file without first closing it (e.g. fitsfile.close()).

    If I specify the new filename NGC628_2.fits, then Python crashes (sends me back to the command prompt) with no errors. A very small file is written that contains only the header data and none of the image data.

    Now this sounds like an actual bug. It sounds like your Python interpreter segfaulted. I can't explain that. But I tried reproducing it (on Windows) by following the same sequence of steps that you described, and I was not able. The final fitsfile.writeto('NGC628_2.fits') worked without problem. One thing I can imagine is that when you open a file with mode='update', the internal state management can become quite complex, as it has to keep track of everything you change about the FITS data structure so that it can properly move things around on disk in an existing file. It's possible that in the course of trying a number of mildly pathological operations (like trying to overwrite the file while it was in the middle of being updated) something got into an undefined state. I couldn't figure out how to do the same thing though.

    A "correct" usage of mode='update' to modify a file in-place might look something like:

    with pyfits.open(filename, mode='update') as fitsfile:
        image = fitsfile[0].data
        z, y, x = image.shape[1:]
        # Replace the original HDU data in-place
        fitsfile[0].data = image.reshape((z, y, x))
    

    and that's it! I would get used to using the with statement: When you use with you perform any actions that require you to have a file open inside the body of the with statement, and then when the with statement is exited, it will ensure that all changes you made (if any) are flushed to disk, and that the file is closed. I would use this even when just opening a file in read-only mode. This is especially important on Windows since, as you found, Windows is particularly picky about whether or not you've properly closed files.

    Similarly, your final example could be rewritten:

    with pyfits.open('NGC628.fits', mode='update') as fitsfile:
        image  = fitsfile[0].data
        header = fitsfile[0].header
        image[:,:,:,:] = 5.0
    

    Finally, you wrote:

    I gather from the astropy documentation that simply opening the file in update mode and closing it should be enough, and in this case it is. But this same trick does not work when reshaping the image - the FITS file is unaltered.

    But I'm not sure what you tried here; you didn't show that attempt. If I had to guess you wrote something like:

        image = np.reshape(image, (z, y, x))
    

    But all this is doing is replacing the value pointed to by the image variable with a new array. It's not updating the original fitsfile[0].data. The previous example of image[:] = 0.5 works because here image points to the same array as fitsfile[0].data and you are modifying its contents in-place. But np.reshape(...) is not an in-place operation; it returns a new array. See my previous example for the correct approach to this.