Consider a nonlinear least squares model in R, for example of the following form):
y ~ theta / ( 1 + exp( -( alpha + beta * x) ) )
(my real problem has several variables and the outer function is not logistic but a bit more involved; this one is simpler but I think if I can do this my case should follow almost immediately)
I'd like to replace the term "alpha + beta * x" with (say) a natural cubic spline.
here's some code to create some example data with a nonlinear function inside the logistic:
set.seed(438572L)
x <- seq(1,10,by=.25)
y <- 8.6/(1+exp( -(-3+x/4.4+sqrt(x*1.1)*(1.-sin(1.+x/2.9))) )) + rnorm(x, s=0.2 )
Without the need for a logistic around it, if I was in lm, I could replace a linear term with a spline term easily; so a linear model something like this:
lm( y ~ x )
then becomes
library("splines")
lm( y ~ ns( x, df = 5 ) )
generating fitted values is simple and getting predicted values with the aid of (for example) the rms package seems simple enough.
Indeed, fitting the original data with that lm-based spline fit isn't too bad, but there's a reason I need it inside the logistic function (or rather, the equivalent in my problem).
The problem with nls is I need to provide names for all the parameters (I'm quite happy with calling them say (b1, ..., b5) for one spline fit (and say c1, ... , c6 for another variable - I'll need to be able to make several of them).
Is there a reasonably neat way to generate the corresponding formula for nls so that I can replace the linear term inside the nonlinear function with a spline?
The only ways I can figure that there could be to do it are a bit awkward and clunky and don't nicely generalize without writing a whole bunch of code.
(edit for clarification) For this small problem, I can do it by hand of course - write out an expression for inner product of every variable in the matrix generated by ns, times the vector of parameters. But then I have to write the whole thing out term-by-term again for each spline in every other variable, and again every time I change the df in any of the splines, and again if I want to use cs instead of ns. And then when I want to try to do some prediction(/interpolation), we get a whole new slew of issues to be dealt with. I need to keep doing it, over and over, and potentially for a substantially larger number of knots, and over several variables, for analysis after analysis - and I wondered if there was a more neat, simple way than writing out each individual term, without having to write a great deal of code. I can see a fairly bull-at-a-gate way to do it that would involve a fair bit of code to get right, but being R, I suspect there's a much neater way (or more likely 3 or 4 neater ways) that's simply eluding me. Hence the question.
I thought I had seen someone do something like this in the past in a fairly nice way, but for the life of me I can't find it now; I've tried a bunch of times to locate it.
[More particularly, I'd generally like to be able to try the fit any of several different splines in each variable - to try a couple of possibilities - in order to see if I could find a simple model, but still one where the fit is adequate for the purpose (noise is really quite low; some bias in the fit is okay to achieve a nice smooth result, but only up to a point). It's more 'find a nice, interpretable, but adequate fitting function' than anything approaching inference and data mining isn't really an issue for this problem.]
Alternatively, if this would be much easier in say gnm or ASSIST or one of the other packages, that would be useful knowledge, but then some pointers on how to proceed on the toy problem above with them would help.
ns
actually generates a matrix of predictors. What you can do is split that matrix out into individual variables, and feed them to nls
.
m <- ns(x, df=5)
df <- data.frame(y, m) # X-variables will be named X1, ... X5
# starting values should be set as appropriate for your data
nls(y ~ theta * plogis(alpha + b1*X1 + b2*X2 + b3*X3 + b4*X4 + b5*X5), data=df,
start=list(theta=1, alpha=0, b1=1, b2=1, b3=1, b4=1, b5=1))
ETA: here's a go at automating this for different values of df. This constructs the formula using text munging, and then uses do.call
to call nls
. Caveat: untested.
my.nls <- function(x, y, df)
{
m <- ns(x, df=df)
xn <- colnames(m)
b <- paste("b", seq_along(xn), sep="")
fm <- formula(paste("y ~ theta * plogis(1 + alpha + ", paste(b, xn, sep="*",
collapse=" + "), ")", sep=""))
start <- c(1, 1, rep(1, length=length(b)))
names(start) <- c("theta", "alpha", b)
do.call(nls, list(fm, data=data.frame(y, m), start=start))
}