A simple dataframe that I want to plot it with its trend-line (polynomial, order 2). However I got the equation obviously wrong:
y = 1.4x**2 + 6.6x + 0.9
It shall be:
y = 0.22x2 - 1.45x + 11.867 # the "2" after x is square
How can I get the correct equation?
import matplotlib.pyplot as plot
from scipy import stats
import numpy as np
data = [["2020-03-03",9.727273],
["2020-03-04",9.800000],
["2020-03-05",9.727273],
["2020-03-06",10.818182],
["2020-03-07",9.500000],
["2020-03-08",10.909091],
["2020-03-09",15.000000],
["2020-03-10",14.333333],
["2020-03-11",15.333333],
["2020-03-12",16.000000],
["2020-03-13",21.000000],
["2020-03-14",28.833333]]
fig, ax = plot.subplots()
dates = [x[0] for x in data]
usage = [x[1] for x in data]
bestfit = stats.linregress(range(len(usage)),usage)
equation = str(round(bestfit[0],1)) + "x**2 + " + str(round(bestfit[1],1)) + "x + " + str(round(bestfit[2],1))
ax.plot(range(len(usage)), usage)
ax.plot(range(len(usage)), np.poly1d(np.polyfit(range(len(usage)), usage, 2))(range(len(usage))), '--',label=equation)
plot.show()
print (equation)
You should define your question better, and I'll explain.
You are trying to fit polynom of second degree (quadratic polynomial function), using series of dates as input, and series of value as output. The problem, is that you have to define what is "zero"- your reference point for the date values. The way you handle that in your code, which is reasonable- but you need to validate that it fits the problem you are trying to solve, is to just look at the 'index' of the date, starting from 0.
When I replace the way you calculate 'bestfit' with the same function you used for generating the graph, I receive similar results to the results you wanted:
Polynomial Equation: 0.22x^2 + -1.02x + 10.63
Two ways that can help you understand the different results I got, from the ones you wanted:
Here is the updated code:
import matplotlib.pyplot as plot
from scipy import stats
import numpy as np
data = [["2020-03-03",9.727273],
["2020-03-04",9.800000],
["2020-03-05",9.727273],
["2020-03-06",10.818182],
["2020-03-07",9.500000],
["2020-03-08",10.909091],
["2020-03-09",15.000000],
["2020-03-10",14.333333],
["2020-03-11",15.333333],
["2020-03-12",16.000000],
["2020-03-13",21.000000],
["2020-03-14",28.833333]]
fig, ax = plot.subplots()
dates = [x[0] for x in data]
usage = [x[1] for x in data]
bestfit = np.polyfit(range(len(usage)), usage, 2)
equation = str(round(bestfit[0],2)) + "x**2 + " + str(round(bestfit[1],2)) + "x + " + str(round(bestfit[2],2))
ax.plot(range(len(usage)), usage)
ax.plot(range(len(usage)), np.poly1d(np.polyfit(range(len(usage)), usage, 2))(range(len(usage))), '--',label=equation)
plot.show()
print (equation)