pythonscipyregressioncurve-fitting

Function with two variables - Polynomial fit Python


I have various pump performance data, and I am trying to fit curves as a combination of them.

In below image, different curves are at different pump speeds (RPM), and the Y axis is Pressure head, and X axis is flow rate.

I am trying to figure out the following functions:

pressure = func (flowrate, rpm)
flowrate = func (pressure, rpm)
rpm = func (flowrate, pressure)

Usually the pump curves follow some form of affinity laws, as

  1. flow rate is proportional to rpm.
  2. pressure is proportional to square of rpm.

However in reality - they do not, and there will be a difference between experimental data & theoretical data. So that is why I am looking to fit a function that looks like this: (open to other suggestions) - like 3rd degree polynomial as well.

Y = a*x1**2 + b*x1+ c*x2**2 + d*x2 + e*x1*x2 + f

However, I am getting the wrong results with scipy - curve_fit. I am not in favor of employing sklearn regression as we would not be able to figure out the coefficients.

Is there a way one can employ this in Python?

See my code below so far:

from numpy import genfromtxt
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
# df = pd.read_excel('./nominal_data/pump_curve.xlsx', sheet_name =0)
df = pd.DataFrame({
    'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
    'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
    'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})
flow_lpm = df['Flow(lpm)'].to_numpy()
rpm = df['Speed Feedback (RPM)'].to_numpy()
dP = df['dP (PSI)'].to_numpy()
X = (flow_lpm, rpm)
y = dP

def f(X, a, b, c, d, e, f):
    x_1, x_2 = X
    return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f
param, param_cov = curve_fit(f, X, y)

# Test a random data point. 
flow_test = 54
rpm_test = 5195

dp_test = param[0]*flow_test**2 + param[1]*flow_test + param[2]*rpm_test**2 + param[3]**rpm_test + param[4]*flow_test*rpm_test + param[5]
print (dp_test)

I could have multiple curves of Pressure = function (flow rate) at multiple RPMs like the figure shows. For example there is a 3rd degree curve for a 5100 RPM curve. However it only works for a given RPM. The curve is different for a 2070 RPM situation.

Seeking suggestions on proper curve fitting methodology. Open to deleting end points, to ensure the curve fit works in the middle range.

datafile_imagefile


Solution

  • With the following dataset you provided:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    from scipy import optimize
    
    data = pd.DataFrame({
        'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
        'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
         'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
    })
    

    Remove nans:

    data = data.dropna()
    

    Define your model as follow:

    def model(x, a, b, c, d, e, f):
        return a * x[:,0] ** 2 + b * x[:,0] + c * x[:,1] ** 2 + d * x[:,1] + e * x[:,0] * x[:,1] + f
    

    Then optimize:

    popt, pcov = optimize.curve_fit(model, data.iloc[:,:2].values, data.iloc[:,2].values)
    

    Now we can regress the model on a finer grid:

    x0lin = np.linspace(data.iloc[:,0].min(), data.iloc[:,0].max(), 200)
    x1lin = np.linspace(data.iloc[:,1].min(), data.iloc[:,1].max(), 200)
    X0, X1 = np.meshgrid(x0lin, x1lin)
    
    Z = model(np.array([X0.ravel(), X1.ravel()]).T, *popt).reshape(X0.shape)
    

    It renders as follow:

    fig, axe = plt.subplots(subplot_kw={"projection": "3d"})
    axe.scatter(*data.values.T)
    axe.plot_surface(X0, X1, Z, cmap="viridis", alpha=0.35)
    

    enter image description here