I'm having the worst time trying to find a JavaScript code that could allow me to do cubic regressions. Would write it myself, but my understanding of polynomial math is, well, suboptimal.
So, here's what I'm looking for. Given an input of an array of arrays, where the internal array would be [x,y], the function would give me an output in the form of an array with four parameters - [a,b,c,d], where a, b, c, and d are parameters of the equation y = ax^3 + bx^2 + cx + d.
Example: Input is an array like this [[2,5],[5,10],[07,15],[12,20],[20,25],[32,30],[50,35]].
Which essentially is the representation of a table:
| x | y | |-----------------| | 02 | 05 | | 05 | 10 | | 07 | 15 | | 12 | 20 | | 20 | 25 | | 32 | 30 | | 50 | 35 |
Now, the output would be [0.000575085,-0.058861065,2.183957502,1.127605507]. These are the a, b, c, and d parameters of the cubic function.
(FYI, the output I got by using Excel's LINEST function and running it on the above set of numbers using an array function {1,2,3}).
How could this be done? Huge thanks in advance for any guidance.
Best, Tom
Here's a real, working bit of code to solve that cubic using the numeric.js library's uncmin
unconstrained minimiser as a least squares problem (jsbin here):
var data_x = [2,5,7,12,20,32,50];
var data_y = [5,10,15,20,25,30,35];
var cubic = function(params,x) {
return params[0] * x*x*x +
params[1] * x*x +
params[2] * x +
params[3];
};
var objective = function(params) {
var total = 0.0;
for(var i=0; i < data_x.length; ++i) {
var resultThisDatum = cubic(params, data_x[i]);
var delta = resultThisDatum - data_y[i];
total += (delta*delta);
}
return total;
};
var initial = [1,1,1,1];
var minimiser = numeric.uncmin(objective,initial);
console.log("initial:");
for(var j=0; j<initial.length; ++j) {
console.log(initial[j]);
}
console.log("minimiser:");
for(var j=0; j<minimiser.solution.length; ++j) {
console.log(minimiser.solution[j]);
}
I get the results:
0.0005750849851827991
-0.05886106462847641
2.1839575020602164
1.1276055079334206
To explain: we have a function 'cubic', which evaluates the general cubic function for a set of parameters params
and a value x
. This function is wrapped to create the objective function, which takes a set of params and runs each x value from our data set through the target function and calculates the sum of the squares. This function is passed to uncmin
from numeric.js with a set of initial values; uncmin
does the hard work and returns an object whose solution
property contains the optimised parameter set.
To do this without the global variables (naughty!), you can have an objective function factory thus:
var makeObjective = function(targetFunc,xlist,ylist) {
var objective = function(params) {
var total = 0.0;
for(var i=0; i < xlist.length; ++i) {
var resultThisDatum = targetFunc(params, xlist[i]);
var delta = resultThisDatum - ylist[i];
total += (delta*delta);
}
return total;
};
return objective;
};
Which you can use to manufacture objective functions:
var objective = makeObjective(cubic, data_x, data_y); // then carry on as before
Knowing how to do this practically would be of great help to a lot of people, so I'm glad this has come up.
Edit: Clarification on cubic
var cubic = function(params,x) {
return params[0] * x*x*x +
params[1] * x*x +
params[2] * x +
params[3];
};
Cubic is being defined as a function which takes an array of parameters params
and a value x
. Given params
, we can define a function f(x)
. For a cubic, that is f(x) = a x^3 + b x^2 + c x + d
so there are 4 parameters ([0]
to [3]
), and given those 4 param values we have a single function f(x)
with 1 input x
.
The code is structured to allow you to replace cubic
with another function of the same structure; it could be linear
with 2 parameters:
var linear = function(params, x) {
return params[0]*x + params[1];
};
The rest of the code will look at the length of params
in order to know how many parameters need modifying.
Note that this whole piece of code is trying to find the set of parameter values which produce a curve which best fits all the data; if you wanted to find a fit for the last 4 points of some data, you would pass only those values in data_x
and data_y
.