I want to create a pair-wise average of price of commodities produced by countries. My data looks like this
df <- data.frame(country = c("US; UK; FI", "CN; IT; US; GR", "UK; US"),
product_id = c(1, 2, 3),
price = c(300, 500, 200))
I want to transform the data to create average of price between dyads of two countries. Something like this:
Ctr_1 Ctr_2 Avg_Price
US UK 250
US FI 300
US CN 500
US IT 500
UK FI 300
UK US 250
CN IT 500
CN US 500
CN GR 500
IT CN 500
IT US 500
IT GR 500
GR CN 500
GR IT 500
GR US 500
I tried changing the data to long form.
library(data.table)
setDT(df)
df1 <- df[, .(country = unlist(strsplit(country, "; "))), by = .(product_id)]
But didn't know how to proceed from here. Any help would be really appreciated. In fact, there is a year variable as well, and the idea is to aggregate pair-wise per year to create a panel dataset.
df1 <- df[, .(country = strsplit(country, '; ')[[1]]), by = .(product_id, price)]
# join product_id and price of c1 (CJ for cross-join)
df2 <- df1[CJ(country, c2 = country),
on = .(country), allow.cartesian = TRUE][country < c2] # keep uniq pairs
# join product_id and price of c2, then get average
res <- df1[df2, on = .(country = c2, product_id), nomatch = 0][
, .(avg_price = mean(price)), by = .(c1 = country, c2 = i.country)]
res
# c1 c2 avg_price
# 1: GR CN 500
# 2: IT CN 500
# 3: US CN 500
# 4: UK FI 300
# 5: US FI 300
# 6: IT GR 500
# 7: US GR 500
# 8: US IT 500
# 9: US UK 250