In the r
programming language, the following
require(stats)
require(splines)
knots = quantile(women$height, seq(0.1,0.9,length.out = 5))
bs(women$height, knots=knots, degree=3)
returns
1 2 3 4 5 6 7 8
0.0000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.0000000000 0.00000000
0.6284418529 0.323939099 0.024295432 0.000000000 0.000000000 0.000000000 0.0000000000 0.00000000
0.2155814707 0.599894720 0.182883868 0.001639942 0.000000000 0.000000000 0.0000000000 0.00000000
0.0349854227 0.495626822 0.438289602 0.031098154 0.000000000 0.000000000 0.0000000000 0.00000000
0.0001619695 0.245586330 0.620809038 0.133442663 0.000000000 0.000000000 0.0000000000 0.00000000
0.0000000000 0.072886297 0.584548105 0.338678328 0.003887269 0.000000000 0.0000000000 0.00000000
0.0000000000 0.009110787 0.384718173 0.561892614 0.044278426 0.000000000 0.0000000000 0.00000000
0.0000000000 0.000000000 0.166666667 0.666666667 0.166666667 0.000000000 0.0000000000 0.00000000
0.0000000000 0.000000000 0.044278426 0.561892614 0.384718173 0.009110787 0.0000000000 0.00000000
0.0000000000 0.000000000 0.003887269 0.338678328 0.584548105 0.072886297 0.0000000000 0.00000000
0.0000000000 0.000000000 0.000000000 0.133442663 0.620809038 0.245586330 0.0001619695 0.00000000
0.0000000000 0.000000000 0.000000000 0.031098154 0.438289602 0.495626822 0.0349854227 0.00000000
0.0000000000 0.000000000 0.000000000 0.001639942 0.182883868 0.599894720 0.2155814707 0.00000000
0.0000000000 0.000000000 0.000000000 0.000000000 0.024295432 0.323939099 0.6284418529 0.02332362
0.0000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.0000000000 1.00000000
Is there a Python equivalent? I have tried BSpline from scipy
, but it requires that the coefficients already be known and be passed in.
How can I just generate the B-spline basis matrix, having passed in an array, knots, and degree?
To reproduce the input Python, you can do:
import numpy as np
women_height = np.array([58,59,60,61,62,63,64,65,66,67,68,69,70,71,72])
knots = array([59.4, 62.2, 65. , 67.8, 70.6])
Turning a comment into an answer, BSpline.design_matrix
is constructing what you are after, in the csr sparse format. It'll be available from scipy 1.8 when it is released. Until then, you can either grab the master branch of scipy, or use a workaround suggested by the docs (https://scipy.github.io/devdocs/reference/generated/scipy.interpolate.BSpline.design_matrix.html#scipy.interpolate.BSpline.design_matrix) :
t = ...
c = np.eye(len(t) - k - 1)
design_matrix_gh = BSpline(t, c, k)(x)
EDIT: R documentation, https://www.rdocumentation.org/packages/splines/versions/3.6.2/topics/bs, states that knots
argument is internal knots. scipy's BSpline
does not pad the knots automatically, so you need to do it yourself. Using the OP data:
In [22]: women_height = np.array([58,59,60,61,62,63,64,65,66,67,68,69,70,71,72])
...: knots = np.array([59.4, 62.2, 65. , 67.8, 70.6])
In [23]: t = np.r_[(58,)*4, knots, (72,)*4] # <<<<<< here
In [24]: m = BSpline.design_matrix(women_height, t, k=3)
In [25]: with np.printoptions(linewidth=120, precision=5):
...: print(m.toarray())
...:
[[1.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
[2.33236e-02 6.28442e-01 3.23939e-01 2.42954e-02 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
[0.00000e+00 2.15581e-01 5.99895e-01 1.82884e-01 1.63994e-03 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
[0.00000e+00 3.49854e-02 4.95627e-01 4.38290e-01 3.10982e-02 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
[0.00000e+00 1.61970e-04 2.45586e-01 6.20809e-01 1.33443e-01 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00]
[0.00000e+00 0.00000e+00 7.28863e-02 5.84548e-01 3.38678e-01 3.88727e-03 0.00000e+00 0.00000e+00 0.00000e+00]
[0.00000e+00 0.00000e+00 9.11079e-03 3.84718e-01 5.61893e-01 4.42784e-02 0.00000e+00 0.00000e+00 0.00000e+00]
[0.00000e+00 0.00000e+00 0.00000e+00 1.66667e-01 6.66667e-01 1.66667e-01 0.00000e+00 0.00000e+00 0.00000e+00]
[0.00000e+00 0.00000e+00 0.00000e+00 4.42784e-02 5.61893e-01 3.84718e-01 9.11079e-03 0.00000e+00 0.00000e+00]
[0.00000e+00 0.00000e+00 0.00000e+00 3.88727e-03 3.38678e-01 5.84548e-01 7.28863e-02 0.00000e+00 0.00000e+00]
[0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 1.33443e-01 6.20809e-01 2.45586e-01 1.61970e-04 0.00000e+00]
[0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 3.10982e-02 4.38290e-01 4.95627e-01 3.49854e-02 0.00000e+00]
[0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 1.63994e-03 1.82884e-01 5.99895e-01 2.15581e-01 0.00000e+00]
[0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 2.42954e-02 3.23939e-01 6.28442e-01 2.33236e-02]
[0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 0.00000e+00 1.00000e+00]]
which looks similar to the R's output from the OP modulo the first column. From the R's docs it's not immediately clear how exactly it pads the knot vector, but if you want the same output, you can just chop the first column off (m.toarray()[1:, :]
or some such).