I would like to fit a graph that closely follow the characteristics of a scatterplot using Python.
I have searched many curve-fitting techniques, and I have tried many different polynomial and linear regression approach powered by Scikit-Learn library.
However, the best I could get is the following The result of polynomial regression
What I expect is somewhat like the following, denoted in green. The ideal result desired
I have thought about simple connecting the dots, but I think that it would not give me a concrete mathematical expression of the graph. So I need to come up with a way to draw a graph that looks like one in the green above, which can be described in a certain mathematical expression.
Can you guys suggest me the ways that I can do it? I was also thinking if combining logarithmic or exponential functions are necessary for this purpose. If anyone has a reasonable idea. I would appreciate your reply and would be glad to upvote!
Just to give you an idea of what kind of code that I used, I am attaching my piece of python code that I used to produce the results displayed with the first picture.
import numpy as np
import pandas as pd
import plotly.express as px
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
# Load the point cloud data
data = pd.read_csv('data/selected_2.csv')
points = data.to_numpy()
points_x = points[:, 0]
points_y = points[:, 1]
points_z = points[:, 2]
poly = PolynomialFeatures(degree=40, include_bias=False)
poly_features = poly.fit_transform(points_x.reshape(-1, 1))
poly_reg_model = LinearRegression()
poly_reg_model.fit(poly_features, points_z)
z_predicted = poly_reg_model.predict(poly_features)
plt.figure(figsize=(10, 6))
plt.title("Your first polynomial regression – congrats! :)", size=16)
plt.scatter(points_x, points_z)
plt.plot(points_x, z_predicted, c="red")
plt.show()
the data that I used is a point cloud data composed of 'X', 'Y', and 'Z' columns, with each row indicating the coordinates. The size of the original data is (23, 3). But as 'Y' components are negligible in my case, I am only concerned with columns 'X' and 'Z'. This is why I am trying to do this in 2D graph.
Your code seems fine, but...
...as you formulated it, the polynomial curve is implicit (there are multiple z
values for the same x
).
Swapping the X
, Z
variables will make your curve explicit and then fittable as it looks like a reasonable polynomial.