np.linalg.solve(X,Y)
just isn't accurate enough. The code below solves a relatively small system of 5 equations:
import numpy as np
n=5
Y = np.random.rand(n)
X = np.tile(np.array(range(1,n+1)),n)
X = X.reshape((n,n),order='F')
for c in range(n) :
X[:,c] = X[:,c]**c
A = np.linalg.solve(X,Y)
predicted_Y = X@A
table = [(y,pred_y,y-pred_y) for y,pred_y in zip(Y,predicted_Y)]
print('y predicted_y difference')
for c1,c2,c3 in table :
print(f"%.20f | %.20f | %.20f" % (c1, c2, c3))
The third column in the output shows that there are still differences between the actual Y values and the ones implied by the solved-for coefficients.
y predicted_y difference
0.68935295599312118586 | 0.68935295599312118586 | 0.00000000000000000000
0.72899266151307307027 | 0.72899266151307240413 | 0.00000000000000066613
0.18770646040141103494 | 0.18770646040141256150 | -0.00000000000000152656
0.02144867791874205398 | 0.02144867791873661389 | 0.00000000000000544009
0.54517050144884360297 | 0.54517050144883372198 | 0.00000000000000988098
I know the differences seem tiny, but I need a high degree of accuracy for what I'm doing. I don't mind if the code is slower, but I want it to be accurate to the 20th decimal place.
The only alternative I've seen is scipy.linalg.solve
, which gives the same problems. Is there an alternative package that will work until a more accurate solution is found?
From how you construct your matrix it seems you want to fit a polynomial on the points x = 1, 2, 3, ..., N.
There is actually a function for that in numpy: np.polyfit
and also np.poly1d
to do exactly that. But they also do it only in float64
precision and not in float128
precision.
What you could do (if you do not care for speed at all) and want arbitrary precision is to use sympy:
import sympy as sp
Xs = sp.Matrix(X)
Xsinv = Xs.inv()
The matrix, in your case would then contain all fractions. Of course as soon as you multiply with a float then you loose the arbitrary precision but still better then the original. If you only every want to compute on those integer grid points you could even pre-compute the matrix.
Inverted matrix for n=5
Edit:
np.float128
in numpy only gives you about 80 bit precision. Just about 20 decimal places. To make use of the infinite precision of sympy the multiplication of your float vector of random values also must be with good precision. Otherwise you loose it there. You can use sympys arbitrary precision floats for doing this. E.g. convert your rands into 100 digit sympy floats:
Y = np.random.rand(n)
Ys= [sp.Float(yy,100) for yy in Y ]
Your total program then would look like:
import numpy as np
import sympy as sp
n=5
Y = np.random.rand(n)
Ys= [sp.Float(yy,100) for yy in Y ]
X = np.tile(np.array(range(1,n+1)),n)
X = X.reshape((n,n),order='F')
for c in range(n) :
X[:,c] = X[:,c]**c
Xs=sp.Matrix(X)
Xsi=Xs.inv()
Xsif=Xsi.applyfunc( lambda x: sp.Float(x,100))
# A = linalg.solve(X,Y)
A=sp.Matrix(Xsi.dot(Ys))
predicted_Y = Xs * A
table = [(y,pred_y,y-pred_y) for y,pred_y in zip(Ys,predicted_Y)]
print('y predicted_y difference')
for c1,c2,c3 in table :
print(sp.N(c1,25),sp.N(c2,25) , sp.N(c3,5))
Output:
y predicted_y difference
0.005484887985994779668885712 0.005484887985994779668885712 -2.6789e-101
0.5334767052647902962903004 0.5334767052647902962903004 -2.0002e-100
0.1227698863903042836298596 0.1227698863903042836298596 -1.2573e-99
0.6348196741030148748663464 0.6348196741030148748663464 -1.3716e-99
0.3946293586105372730443719 0.3946293586105372730443719 -2.7432e-99
So you have 99 digits
Notice that your Y values look like:
0.4320677476357143165230922932096291333436965942382812500000000000000000000000000000000000000000000000
(because the np.random.rand
only gives you a limited precision )