I have a DataFrame consisting of two columns as follows:
col1 col2
0.33 4.33
0.21 4.89
3.2 18.78
6.22 0.05
6.0 2.1
... ...
... ...
Now I would like to create a 200 x 200 numpy array by binning both columns. The x-axis should be col1
and the y-axis should be col2
. col1
should be binned logarithmically from 0 to 68 and col2
logarithmically from 0 to 35. I would like to use logarithmic binning because there are more smaller values than larger values (i.e. the bins are getting larger with larger values). The 200 x 200 array should then store the amount of samples in each bin (i.e. the count).
Is this possible to do in an efficient way?
Something like this might work for you... (note that you have to choose how close to zero the lower end is):
bins1 = np.logspace(np.log10(0.001), np.log10(68), num=201)
bins2 = np.logspace(np.log10(0.001), np.log10(35), num=201)
result = np.histogram2d(df['col1'], df['col2'], bins=[bins1, bins2])
...where result[0]
are the counts in the bins, and result[1]
and result[2]
are the bin edges (the same as bins1
and bins2
)