pandasdataframehistogrambinninghistogram2d

Creating a 2D image by logarithimal binning


I have a DataFrame consisting of two columns as follows:

col1      col2
0.33      4.33
0.21      4.89
3.2       18.78
6.22      0.05
6.0       2.1
...       ...
...       ...

Now I would like to create a 200 x 200 numpy array by binning both columns. The x-axis should be col1 and the y-axis should be col2. col1 should be binned logarithmically from 0 to 68 and col2 logarithmically from 0 to 35. I would like to use logarithmic binning because there are more smaller values than larger values (i.e. the bins are getting larger with larger values). The 200 x 200 array should then store the amount of samples in each bin (i.e. the count).

Is this possible to do in an efficient way?


Solution

  • Something like this might work for you... (note that you have to choose how close to zero the lower end is):

    bins1 = np.logspace(np.log10(0.001), np.log10(68), num=201)
    bins2 = np.logspace(np.log10(0.001), np.log10(35), num=201)
    
    result = np.histogram2d(df['col1'], df['col2'], bins=[bins1, bins2])
    

    ...where result[0] are the counts in the bins, and result[1] and result[2] are the bin edges (the same as bins1 and bins2)