I have a dataset that consists of various individuals' ratings of a bunch of variables. Each individual, differentiated by unique ID numbers, rated each of the variables for two targets: for themselves (target = s) and someone else (target = o). A simplified mock-up of the dataframe looks like this:
id <- c("123", "123", "234", "234", "345", "345", "456", "456", "567", "567")
target <- c("s", "o", "s", "o", "s", "o", "s", "o", "s", "o")
v1 <- c(1, 2, 3, 7, 2, 5, 4, 4, 1, 3)
v2 <- c(7, 6, 5, 7, 1, 3, 5, 4, 1, 1)
v3 <- c(2, 2, 2, 4, 5, 2, 7, 1, 3, 3)
df <- data.frame(id, target, v1, v2, v3)
I want to find the Euclidean distance between each individual's self rating and other person rating across all the variables. Ideally, I want the end result to look kind of like this:
id <- c("123", "234", "345", "456", "567")
euclidean_distance <- c(1.414214, 4.898979, 4.690416, 6.082763, 2)
df_final <- data.frame(id, euclidean_distance)
An example of how I'm doing this for one individual would be:
id_123 <- df %>%
filter(id == 123)
dist(select(id_123, v1:v3))
However, this takes a long time to do one at a time (my actual data set has hundreds of individuals, not just 5) and I'm more likely to make transcription mistakes doing all of this one at a time, by hand. So I'm trying to figure out a way to iterate through all the individuals (so, every unique ID number) to get each individual's one Euclidean distance output value.
Do you have any suggestions about how to achieve this? Any help greatly appreciated!
Edit: Afterwards, I prefer @thelatemail 's answer, which summarise with groups.
Here is a solution with purrr::map()
. It is not exactly a loop (you can read about Functionals in Advanced R). The ~ .x
syntax is outdated, comments are welcome so I could improve!
library(tidyverse)
df %>%
split(.$id) %>%
map(~ .x %>% select(v1:v3) %>%
dist() %>%as.numeric() %>%
as_tibble_col(column_name ="euclidean_distance" )) %>%
list_rbind(names_to="id")
Nice minimal reproducible example by the way :)