pythonnumpyscipycurve-fittingdata-fitting

Fitting data with a three-parameter Weibull distribution function


In my data, the first column is the x values, and the second column is the y values. I want to fit the data with a three-parameter Weibull function to describe the distribution.

I tried to find solutions by many searches, but a similar post on Stack Overflow seems to be only for one-column data. I also tried with the libraries scipy.stats.exponweib and scipy.stats.weibull_min, but failed due to the same problem.

So I tried to define the function and to fit the data with scipy.optimize.curve_fit. The following is the code I used. The data.txt file can be downloaded by this link.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Load data
poresize, psd, psd_std = np.loadtxt("data.txt", unpack=True)

# Define the Weibull distribution function with the requested form
def weibull_func(x, a, b, c):
    return a * b * (x - c) ** (b - 1) * np.exp(-a * (x - c) ** b)

# Perform curve fitting
popt, pcov = curve_fit(weibull_func, poresize, psd, p0=[1, 1, 1])

# Plot the original data and the fitted curve
plt.scatter(poresize, psd, label='Data')
x_range = np.linspace(min(poresize), max(poresize), 100)
plt.plot(x_range, weibull_func(x_range, *popt), 'r-', label='Fitted curve (Weibull)')
plt.xlabel('Particle Size')
plt.ylabel('PSD')
plt.title('Fitting Weibull Distribution')
plt.legend()
plt.grid(True)
plt.show()

# Display the optimized parameters
a_opt, b_opt, c_opt = popt
print("Optimized Parameters:")
print("a:", a_opt)
print("b:", b_opt)
print("c:", c_opt)

The function form is adapted from the Weibull distribution from Matlab, but x is replaced with x-c. The problem is that I tried many combinations of the initial guess values for p0, but none of them could give me a satisfactory result.

Could you please tell me whether there is something fundamentally wrong with the code? Or how can I get the proper initial guess fast?


Solution

  • Two pieces of advice when fitting a function:

    1. Understand the function you're trying to fit. In this case, you should realize that it is undefined for x < c when b is not an integer. (This is because you have (x-c)^(b-1). If x < c then x - c < 0. If b is not an integer then you will be taking a fractional power of a negative number, which produces a complex number.)
    2. Plot your data and play around with a model of your function to get a good guess. In this case, I plotted your data and saw that the spike is between x = 2 and x = 5 and it spikes up to y = 1. Knowing that, I made this Desmos model and played with the variables until I got something decent.

    Here is the fixed function that can handle x < c:

    def weibull_func(x, a, b, c):
        res = np.zeros_like(x)
        cond = x > c
        xcond = x[cond]
        res[cond] = a*b*(xcond - c)**(b - 1)*np.exp(-a*(xcond - c)**b)
        return res
    

    I provided the curve_fit call with the guess I got using Desmos:

    popt, pcov = curve_fit(weibull_func, poresize, psd, p0=[1.4, 1.93, 2.4])
    

    The tuned parameters are:

    a = 1.1172264863332277
    b = 2.2231297860177226
    c = 2.301372954142873
    

    Resulting plot: