So, basically I have a data frame with lots of 'sets of positions' of items, and I want to calculate a distance matrix for each set of items. I could do this using a for loop and adding to a list, but I think there must be a more elegant method using dplyr, purrr or similar but I'm drawing a complete blank on how to proceed.
So let's assume my data frame looks like this:
df <- data.frame(
trial = c(rep(1,3),rep(2,5),rep(3,7)),
object_name = c("stapler", "bottle", "cup", "ball", "chocolate","tape","pen","bowl","stapler", "bottle", "cup", "ball", "tape","pen","bowl"),
posX = c(0.1,0.2,0.3,0.3,0.2,0.5,-0.4,-0.1,0.8,-0.3,-0.4,0.3,0.2,0,-0.2),
posY = c(-0.2,0.5,0.3,0.9,-0.3,-0.1,0,0.6,-0.7,-1,0.2,0.3,-0.8,0.6,1)
)
i.e. the object names might overlap from trial to trial, but there is a different number of objects for each trial, and my goal is to calculate a (euclidean) distance matrix using posX and posY for each trial. These matrices will be of different size for each trial, ranging from 3x3 to 7x7. Ideally i'd be able to store all these matrices within 1 cell of a data frame, but i'm not even so sure this is possible? If not, a list that contains for each trial a distance matrix would also work.
Thanks for any help!
You can use lapply
after you split
df by trial what allows to calculate the distances per trial.
lapply(split(df, df$trial), function(x) dist(cbind(x$posX,x$posY)))
#lapply(split(df, df$trial), function(x) dist(cbind(x$posX,x$posY), diag = TRUE, upper = TRUE)) #For a matrix
#$`1`
# 1 2
#2 0.7071068
#3 0.5385165 0.2236068
#
#$`2`
# 1 2 3 4
#2 1.2041595
#3 1.0198039 0.3605551
#4 1.1401754 0.6708204 0.9055385
#5 0.5000000 0.9486833 0.9219544 0.6708204
#
#$`3`
# 1 2 3 4 5 6
#2 1.1401754
#3 1.5000000 1.2041595
#4 1.1180340 1.4317821 0.7071068
#5 0.6082763 0.5385165 1.1661904 1.1045361
#6 1.5264338 1.6278821 0.5656854 0.4242641 1.4142136
#7 1.9723083 2.0024984 0.8246211 0.8602325 1.8439089 0.4472136