pythoncurve-fittingcurvepoisson

Histogram doesn't fit the poisson distribution when the image is corrupted by Poisson process


I am trying to corrupt the sample image with the Poisson process using the code (shown below) provided in the link Adding poisson noise to an image. The corrupted image is shown in Figure 1. Then I select a homogeneous ROI (shown within red color) from the image and analyze the distribution of the intensity values. This should also follow Poisson distribution.

import numpy as np
image = read_image("YOUR_IMAGE")  
noisy = np.random.poisson(image / 255.0 * PEAK) / PEAK * 255 

enter image description here

#Issue1: When I try to fit the Poisson curve using the code below, the distribution doesn’t seem to fit the histogram properly as shown in Figure 2.

#Read the image
clean_img = cv2.imread("my_image.png") 
clean_gray = cv2.cvtColor(clean_img, cv2.COLOR_BGR2GRAY)
#Convert to 0-1 range
clean_gray = (clean_gray-clean_gray.min())/(clean_gray.max()-clean_gray.min())

#Corrupt the image using poisson process and convert it back to the range 0-255
peak = 10
noisy_gray = np.random.poisson(clean_gray*peak)/peak
noisy_gray = (((noisy_gray-noisy_gray.min())/(noisy_gray.max()-noisy_gray.min()))*255).astype(np.uint8)

#Define the fit function
def fit_function(x, lambda_p):
         # Ensure x is an array of integers
        pois_pmf = (np.exp(-lambda_p)* (lambda_p ** x) )/ factorial(x+0.00001)
        #print(pois_pmf.dtype)
        return pois_pmf

def calculate_fit_curve(noisy_roi):
    # Plot histogram of the selected ROI
    plt.figure(figsize=(6, 4))
    bins = np.arange(150) - 0.05
    count, bin_edges, _ = plt.hist(noisy_roi.ravel(), bins=bins, density=True, facecolor='steelblue', label='Data')    
    x = ((bin_edges[:-1] + bin_edges[1:]) / 2)

    # Fit the sample data to Poisson distribution using curve_fit
    pois_params, _ = curve_fit(fit_function, x, count, maxfev=1000)
    pois_opt = fit_function(x, *pois_params)
    plt.plot(x, pois_opt,label='Poisson', color='r')
    plt.xlabel('Intensity values')
    plt.ylabel('Normalized density')
    plt.title('Histogram with Fitted Distributions')
    plt.legend()
    plt.show()

enter image description here

Remark: However, when the distribution fit is directly applied to the samples obtained using np.random.poisson(lambda)(as given in the code below), then the distribution fits perfectly as observed in Figure 3.

data = np.random.poisson(lam=4, size=1000)
# Bins should be of integer width, because Poisson is an integer distribution
bins = np.arange(15) - 0.5
count, bin_edges, patches = plt.hist(data, bins=bins, density=True, label='Data')
# calculate bin centers
x = 0.5 * (bin_edges[1:] + bin_edges[:-1])

def fit_function(k, lamb):
    '''poisson function, parameter lamb is the fit parameter'''
    return poisson.pmf(k, lamb)

# fit with curve_fit
parameters, cov_matrix = curve_fit(fit_function, x, count, maxfev=1000)
plt.plot(x,fit_function(x, *parameters),label='Poisson')
plt.legend()
plt.show()

enter image description here

#Issue 2: With respect to Issue 1, if we increase the number of bins further, (eg: bins = np.arange(171) - 0.05), then the line pois_params, _ = curve_fit(fit_function, x, count, maxfev=1000) throws an error given below and the curve fit goes far from the histogram as shown in Figure 4.

RuntimeWarning: **overflow encountered in power**
pois_pmf = (np.exp(-lambda_p)* (lambda_p ** x) )/ factorial(x+0.00001)
plot_dist_all_without_noise.py:113: RuntimeWarning: **invalid value encountered in true_divide**
pois_pmf = (np.exp(-lambda_p)* (lambda_p ** x) )/ factorial(x+0.00001)

I want to plot all the intensity values between 0 to 255 instead of limiting the values in x axis. enter image description here

Issue3: As the peak increases, say peak=30, and the bin values are still between 0 to 170, I get Optimal parameters not found error as shown below.

pois_params, _ = curve_fit(fit_function, x, count, maxfev=1000)
  File "C:\Users\admin\anaconda3\envs\retinexdip\lib\site-packages\scipy\optimize\minpack.py", line 748, in curve_fit
    raise RuntimeError("Optimal parameters not found: " + errmsg)
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 1000.

So, inorder to solve this issue, if I try to increase the limit to say maxfev=10000 or don’t set the limit at all, then I still get the error overflow encountered in power explained in Issue 2.

Seeking the help. Thank you.

Edit: The corresponding clean image of Figure 1 is shown below enter image description here


Solution

  • It seems the problem is the following: there are a lot of 0s in the dependent data to be fitted, i.e. in count variable, please have a look here: (You can see that from your histogram plots as well, there are gaps in Figure 2 and 4, unlike Figure 3)

    array([0.003, 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.02 , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.044, 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.073, 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.141, 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.17 ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.162, 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.146, 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.093, 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.065, 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.04 , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.019,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.013, 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.007, 0.   , 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.003, 0.   , 0.   , 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.001, 0.   , 0.   ,
           0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ])
    
    

    And since AUC of PMF of Poisson distribution = 1, it fits that way (AUC - area under the curve, PMF - probability mass function). One of the possible solutions would be to normalize the image not to range 0-255, but to 0-25, i.e. noisy_gray = (((noisy_gray-noisy_gray.min())/(noisy_gray.max()-noisy_gray.min()))*25).astype(np.uint8) like this:

    clean_img = cv2.imread("my_image.png") 
    clean_gray = cv2.cvtColor(clean_img, cv2.COLOR_BGR2GRAY)
    #Convert to 0-1 range
    clean_gray = (clean_gray-clean_gray.min())/(clean_gray.max()-clean_gray.min())
    
    #Corrupt the image using poisson process and convert it to the range 0-25
    peak = 10
    noisy_gray = np.random.poisson(clean_gray*peak)/peak
    # noisy_gray = np.random.poisson(clean_gray)
    noisy_gray = (((noisy_gray-noisy_gray.min())/(noisy_gray.max()-noisy_gray.min()))*25).astype(np.uint8)
    
    def calculate_fit_curve(noisy_roi):
        # Plot histogram of the selected ROI
        plt.figure(figsize=(6, 4))
        bins = np.arange(noisy_roi.max()) - 0.05
        count, bin_edges, _ = plt.hist(noisy_roi.ravel(), bins=bins, density=True, facecolor='steelblue', label='Data')    
        x = ((bin_edges[:-1] + bin_edges[1:]) / 2)
    
        # Fit the sample data to Poisson distribution using curve_fit
        pois_params, _ = curve_fit(fit_function, x, count, maxfev=1000)
        pois_opt = fit_function(x, *pois_params)
        plt.plot(x, pois_opt,label='Poisson', color='r')
        plt.xlabel('Intensity values')
        plt.ylabel('Normalized density')
        plt.title('Histogram with Fitted Distributions')
        plt.legend()
        plt.show()
    
    

    The output would be similar to the following:

    enter image description here

    Alternatively you can try to find better way of calculating histogram bins, such that there will be small amount of 0 values in the count variable and will have ~100 bins to have decent amount of training data to be fitting.

    In Issue2 you have similar problem, and Issue3 is just overflow issue, can't deal with big numbers.