For a range I want to identify where changes in distribution happens and where that value is the maximum. Currently I am using a kernel maximum discrepancy test for every value in the range and I am taking the 200 values before and after that value then I extract the locations where the mmd statistic is maximum. But this is very computationally intensive to calculate in R. Please note that I am using kernlab to calculate kmmd. I want to know if there is a way to do this faster? Or if you have any suggestions. Any help would be appreciated.
My code is:
cvg<-seq(1,2000)
cvg<-cvg^3-2*cvg^2+5*cvg
myRange<-seq(400:(length(cvg)-400))
kernel<-"splinedot"
cvg[201:(length(cvg)-200)]->cvg
myRange<-seq(400:(length(cvg)-400))
lapply(myRange, function(x) mmdstats(kmmd((as.matrix(cvg[(x+1):(x+400)])), (as.matrix(cvg[(x+801):(x+1200)])), kernel=kernel)))->kmm.ls
as.data.frame(as.matrix(kmm.ls))->kmm.ls
lapply(kmm.ls, function(x) which.max(mmdstats(x)))->store.max
I state that I am not an expert on the subject with kernlab
so I can not judge the correctness of your analysis or improve your code.
However, I can suggest you convert your lapply
call to a parallelized version such as sfLapply
, parLapply
, mclapply
future_lapply
ecc.
Here I post an example with sfLapply
from the snowfall
package(which is really straightforward imo):
#your original lapply call took 500 seconds on my PC
system.time(kmm.ls <- lapply(myRange, function(x) mmdstats(kmmd((as.matrix(cvg[(x+1):
(x+400)])), (as.matrix(cvg[(x+801):(x+1200)])), kernel=kernel))))
library(kernlab)
library(snowfall)
sfInit(parallel=TRUE,cpus = parallel::detectCores()-1)
# Load the required packages inside the cluster
sfLibrary(kernlab)
#export all variable in all the cluster
sfExportAll()
# Run parallelized lapply with custom function
#sfLapply took 22 second on my 48 cores PC
system.time(kmm.ls <- sfLapply(myRange, function(x)
mmdstats(kmmd((as.matrix(cvg[(x+1):(x+400)])), (as.matrix(cvg[(x+801):(x+1200)])),
kernel=kernel))))
#stop cluster
sfStop()
This is an example with only the first lapply
call of your code, but the same idea can be applied to the second call (when I tried to run your code, the second lapply
call give me an error)
Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘mmdstats’ for signature ‘"list"’
It doesn't seem a critical error but as I said I don't feel prepared to advise how to fix it.