In the NumPy module, when you call np.polynomial.Polynomial.fit
with full=True
and you look at the residuals value that's returned you get an object of type array. If this value is always a single number, why is it returned as an array?
Because, what Polynomial.fit
basically does, is calling lstsq
.
See an artificial example
import numpy as np
x=np.linspace(-1,1,11,1) # I choose this range because I am lazy: then I know that there is no conversion needed to get the returned coefficient
y=12+x+3*x**2+0.5*x**3
np.polynomial.polynomial.Polynomial.fit(x,y,3,full=True)
# (Polynomial([12. , 1. , 3. , 0.5], domain=[-1., 1.], window=[-1., 1.], symbol='x'), [array([7.69471606e-30]), 4, array([1.38623557, 1.32269864, 0.50046809, 0.27991237]), 2.4424906541753444e-15])
And now, trying to redo it myself with lstsq (Find coefficient such as linear combination of columns of M with those coefficients result into y)
# Columns of M are 1, x, x², x³.
M=np.array([np.ones_like(x), x, x**2, x**3]).T
np.linalg.lstsq(M, y)
# So returned coefficient are such as y ≈ cf[0]+cf[1]*x+cf[2]*x²+cf[3]*x³
# So, the coefficient of the deg 3 polynomial
# (array([12. , 1. , 3. , 0.5]), array([1.7292925e-30]), 4, array([3.60116183, 2.60171483, 1.07908918, 0.50695163]))
So, same coefficients, of course. I didn't add any noise, so fitting is perfect.
And other returned values are the one returned by Polynomial.fit
: sum of squared residuals, rank of matrix, singular values
(Focus on the form of the answer, rather than on the values: it is not exactly that least square that Polynomial.fit
does)
So, Polynomial.fit
returns a singleton array of sum of squared residuals, because it returns what lstsq
returns, and that is what lstsq
returns.
Now, of course, your next question has to be "but then, why lstsq
does that".
Because, contrarily to Polynomial.fit
, lstsq
could be called with an array of y
vectors (so a 2D array)
For example:
y = np.array([12+x+3*x**2+0.5*x**3, 1+x]).T
np.linalg.lstsq(M, y)
#(array([[1.20000000e+01, 1.00000000e+00],
# [1.00000000e+00, 1.00000000e+00],
# [3.00000000e+00, 7.77156117e-16],
# [5.00000000e-01, 0.00000000e+00]]),
# array([1.72929250e-30, 4.38386909e-31]), 4, array([3.60116183, 2.60171483, 1.07908918, 0.50695163]))
As you can see, 2 questions (2 polynomials to guess), so 2 answers: 2 sets of coefficients, 2 sum of squared residuals (but only one rank and one singular values, since those are only specific to M, and there is only one M)
So, that is why lstsq
returns an array: it is the sum of squared residuals for all y
we fit. If we fit only one y
, then, there is only one sum in that array.
Since Polynomial.fit
cannot be called with a 2D-array as y, it is always in this case where these is only one sum in that array.