I have two dataframes:
Node1<-c(1, 2, 3, 4, 5, 6, 7)
Node1<-c(1, 1, 1, 2, 2, 3)
Node2<-c(2, 3, 4, 3, 4, 4)
Mean<-c(2, 1, 5, 3, 2, 3)
SD<-c(0.1, 0.2, 0.01, 0.2, 0.01, 0.01)
df1<-data.frame(Node1, Node2, Mean, SD)
df1
Node1 Node2 Mean SD
1 1 2 2 0.10
2 1 3 1 0.20
3 1 4 5 0.01
4 2 3 3 0.20
5 2 4 2 0.01
6 3 4 3 0.01
df2<-data.frame(Node1, Node2, value)
df2
Node1 Node2 value
1 1 2 1
2 1 3 2
3 1 4 1
4 2 3 4
5 2 4 6
6 3 4 7
For each "value" row in df2, I want to calculate the z-score using the Mean and standard deviation (SD) from the corresponding Node pair (i.e., Node1 and Node2) from df1 as the reference values. E.g., for the first row from df2, the z-score should be ((1-2)/0.10) as the corresponding mean and SD are derived by matching Node1 and Node2 values from both dfs.
How do I perform this in R? My current solutions are using left_join
to combine df1 and df2, then use mutate
and case_when
but I'm unable to go beyond this to extract the z-scores while retaining specific conditions (i.e., do while matching Node1 and Node2 values from both dfs)
I think you can just do:
left_join(df1, df2, by = c('Node1', 'Node2')) %>%
mutate(z_score = (Mean - value) / SD)
#> Node1 Node2 Mean SD value z_score
#> 1 1 2 2 0.10 1 10
#> 2 1 3 1 0.20 2 -5
#> 3 1 4 5 0.01 1 400
#> 4 2 3 3 0.20 4 -5
#> 5 2 4 2 0.01 6 -400
#> 6 3 4 3 0.01 7 -400