I have a very simple question, but Google does not seem to be able to help me here. I want a subsample of a pyfits table... basically just remove 90% of the rows, or something like that. I read the table with:
data_table = pyfits.getdata(base_dir + filename)
I like the pyfits table organization where I access a field with data_table.field(fieldname)
, so I would like to keep the data structure, but remove rows.
You can use numpy.random.choice
to create an array containing several random choices from another array.
In your case you want "x" rows from your data_table
. You can't directly use choice
on the Table but you can use the len
of your table for random.choice
:
import numpy as np
rows_numbers_to_keep = np.random.choice(len(data_table), 2, replace=False)
And then index your table:
subsample = data_table[rows_numbers_to_keep]
For example (I'm using astropy because PyFITS isn't developed anymore and has been migrated to astropy.io.fits
):
>>> data
FITS_rec([(1, 4, 7), (2, 5, 8), (3, 6, 9), (4, 7, 0)],
dtype=(numpy.record, [('a', 'S21'), ('b', 'S21'), ('c', 'S21')]))
>>> data[np.random.choice(len(data), 2, replace=False)] # keep 2 distinct rows
FITS_rec([(1, 4, 7), (4, 7, 0)],
dtype=(numpy.record, [('a', 'S21'), ('b', 'S21'), ('c', 'S21')]))
If you want to allow getting the same row several times you can use replace=True
instead.