I have built a function in R (running on Ubuntu 12.04 LTS 64bit, 4 core i7 server with multithreading and 6gb ram) where I've installed R using the standard packages:
sudo apt-get install r-base r-recommended r-base-dev
sudo apt-get install r-cran-multicore r-cran-iterators r-cran-foreach r-cran-domc
NB: I also installed foreach
& doMC
inside R (which didn't help either), like I installed the deldir
package:
install.packages(c("deldir"), dependencies = TRUE)
My function runs fine, but it does not use parallel cores (just maxes out 1 of the 8):
library(deldir)
library(foreach)
library(doMC)
registerDoMC(cores=8)
#getDoParWorkers()
#getDoParName()
#getDoParVersion()
# loop through files
inputfiles <- dir(path="/home/geoadmin/data/objects/", pattern='.txt')
for( inputfilenr in 1:length(inputfiles))
{
# set file variables
curinputfile = paste("/home/geoadmin/data/objects/",inputfiles[[inputfilenr]], sep = "", collapse = NULL)
print (curinputfile)
curoutputfile = paste("/home/geoadmin/data/objects/",substr(inputfiles[[inputfilenr]], start=1, stop=10), '.out', sep = "", collapse = NULL)
# select the point x/y coordinates into a data frame...
points <- read.csv(curinputfile, header = TRUE, sep = ",", dec=".", fill = TRUE)
# set calculation variables, precision on 3 digits only because of the RDW coordinate system
voro = deldir(points$x, points$y, digits=3, list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y))))
tiles = tile.list(voro)
poly = array()
# start loop
poly <- foreach (i=1:length(tiles), .combine=cbind) %dopar%
{
# load tile info
tile = tiles[[i]]
# start with EWKB notation
curpoly = "POLYGON(("
# add list of coordinates by looping through the points in tile
for (j in 1:length(tiles[[i]]$x)) { curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]]) }
# then again the first point to close the polygon and end the EWKB notation, adding that to the poly array
sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]])
}
write.csv(t(poly), file = curoutputfile, row.names = FALSE)
}
So the results are good, but no parallelism...
doMC
did register correctly:
> getDoParWorkers()
[1] 8
> getDoParName()
[1] "doMC"
> getDoParVersion()
[1] "1.2.5"
If I look at the usage (with top
):
top - 01:03:19 up 9 min, 3 users, load average: 1.02, 0.86, 0.45
Tasks: 131 total, 2 running, 127 sleeping, 0 stopped, 2 zombie
Cpu(s): 12.5%us, 0.0%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 6104932k total, 1240512k used, 4864420k free, 16656k buffers
Swap: 6283260k total, 0k used, 6283260k free, 141996k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1553 zzzzzzzz 20 0 913m 850m 3716 R 100 14.3 8:22.03 R
So just maxing out one core. Does anyone have any idea what could cause foreach
/doMC
to not use multiple cores?
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] doMC_1.2.5 multicore_0.1-7 iterators_1.0.6 foreach_1.4.0
[5] deldir_0.0-19
loaded via a namespace (and not attached):
[1] codetools_0.2-8
To add the likely answer for the question: As foreach/mc does work on the computer itself (with the standard example), it's the specific code itself and likely that the voro=deldir part takes up the time, not the loop after it. This however means that the deldir package needs to be adjusted. Looking at the code in the DelDir source it seems I would need to adjust this snippet in the code:
# Call the master subroutine to do the work:
repeat {
tmp <- .Fortran(
'master',
x=as.double(x),
y=as.double(y),
sort=as.logical(sort),
rw=as.double(rw),
npd=as.integer(npd),
ntot=as.integer(ntot),
nadj=integer(tadj),
madj=as.integer(madj),
ind=integer(npd),
tx=double(npd),
ty=double(npd),
ilist=integer(npd),
eps=as.double(eps),
delsgs=double(tdel),
ndel=as.integer(ndel),
delsum=double(ntdel),
dirsgs=double(tdir),
ndir=as.integer(ndir),
dirsum=double(ntdir),
nerror=integer(1),
PACKAGE='deldir'
)
Not sure yet how i can format this into a thing which would work with foreach though...