Function with two variables - Polynomial fit Python

I have various pump performance data, and I am trying to fit curves as a combination of them.

In below image, different curves are at different pump speeds (RPM), and the Y axis is Pressure head, and X axis is flow rate.

I am trying to figure out the following functions:

pressure = func (flowrate, rpm)
flowrate = func (pressure, rpm)
rpm = func (flowrate, pressure)

Usually the pump curves follow some form of affinity laws, as

flow rate is proportional to rpm.
pressure is proportional to square of rpm.

However in reality - they do not, and there will be a difference between experimental data & theoretical data. So that is why I am looking to fit a function that looks like this: (open to other suggestions) - like 3rd degree polynomial as well.

Y = a*x1**2 + b*x1+ c*x2**2 + d*x2 + e*x1*x2 + f

However, I am getting the wrong results with scipy - curve_fit. I am not in favor of employing sklearn regression as we would not be able to figure out the coefficients.

Is there a way one can employ this in Python?

See my code below so far:

from numpy import genfromtxt
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
# df = pd.read_excel('./nominal_data/pump_curve.xlsx', sheet_name =0)
df = pd.DataFrame({
    'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
    'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
    'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})
flow_lpm = df['Flow(lpm)'].to_numpy()
rpm = df['Speed Feedback (RPM)'].to_numpy()
dP = df['dP (PSI)'].to_numpy()
X = (flow_lpm, rpm)
y = dP

def f(X, a, b, c, d, e, f):
    x_1, x_2 = X
    return a*x_1**2 + b*x_1 + c*x_2**2 + d*x_2 + e*x_1*x_2 + f
param, param_cov = curve_fit(f, X, y)

# Test a random data point. 
flow_test = 54
rpm_test = 5195

dp_test = param[0]*flow_test**2 + param[1]*flow_test + param[2]*rpm_test**2 + param[3]**rpm_test + param[4]*flow_test*rpm_test + param[5]
print (dp_test)

I could have multiple curves of Pressure = function (flow rate) at multiple RPMs like the figure shows. For example there is a 3rd degree curve for a 5100 RPM curve. However it only works for a given RPM. The curve is different for a 2070 RPM situation.

Seeking suggestions on proper curve fitting methodology. Open to deleting end points, to ensure the curve fit works in the middle range.

datafile_imagefile

Solution

With the following dataset you provided:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import optimize

data = pd.DataFrame({
    'Flow(lpm)': [128.0846942, 105.7579874, 95.11146262, 69.57902038, 53.25344504, 35.47260492, np.nan, 131.96, 110.57, 91.32, 73.02, 53.9, 20.41, np.nan, 116.06, 99.46, 83.7, 68.84, 54.47, 20.98, np.nan, 103.0, 87.6, 73.6, 57.8, 44.0, 19.49, np.nan, 86.2, 73.1, 56.0, 42.6, 28.1, 16.33, np.nan, 56.2, 47.1, 38.6, 30.9, 24.0, 12.3],
    'Speed Feedback (RPM)': [5204.0, 5208.0, 5206.0, 5206.0, 5176.0, 5175.0, np.nan, 4710.72, 4706.4, 4714.93, 4687.11, 4691.0, 4602.0, np.nan, 4103.21, 4115.26, 4147.8, 4148.14, 4141.09, 4124.72, np.nan, 3675.89, 3657.88, 3673.73, 3671.41, 3675.27, 3664.88, np.nan, 3118.66, 3186.23, 3106.92, 3107.19, 3114.69, 3090.08, np.nan, 2077.44, 2073.23, 2062.01, 2069.37, 2068.02, 2067.91],
     'dP (PSI)': [16.5, 25.34, 28.78, 35.45, 37.86, 38.87, np.nan, 8.85, 17.01, 23.42, 27.48, 30.5, 32.4, np.nan, 6.69, 11.84, 17.24, 20.16, 22.64, 25.81, np.nan, 5.2, 9.6, 13.2, 16.3, 18.1, 20.38, np.nan, 3.7, 6.5, 10.0, 12.1, 13.5, 14.54, np.nan, 1.2, 2.7, 3.7, 4.7, 5.2, 6.3]
})

Remove nans:

data = data.dropna()

Define your model as follow:

def model(x, a, b, c, d, e, f):
    return a * x[:,0] ** 2 + b * x[:,0] + c * x[:,1] ** 2 + d * x[:,1] + e * x[:,0] * x[:,1] + f

Then optimize:

popt, pcov = optimize.curve_fit(model, data.iloc[:,:2].values, data.iloc[:,2].values)

Now we can regress the model on a finer grid:

x0lin = np.linspace(data.iloc[:,0].min(), data.iloc[:,0].max(), 200)
x1lin = np.linspace(data.iloc[:,1].min(), data.iloc[:,1].max(), 200)
X0, X1 = np.meshgrid(x0lin, x1lin)

Z = model(np.array([X0.ravel(), X1.ravel()]).T, *popt).reshape(X0.shape)

It renders as follow:

fig, axe = plt.subplots(subplot_kw={"projection": "3d"})
axe.scatter(*data.values.T)
axe.plot_surface(X0, X1, Z, cmap="viridis", alpha=0.35)