rpanel-datageostatistics

Creating dyad-pair averages in R


I want to create a pair-wise average of price of commodities produced by countries. My data looks like this

df <- data.frame(country    = c("US; UK; FI", "CN; IT; US; GR", "UK; US"),
                 product_id = c(1, 2, 3),
                 price      = c(300, 500, 200))

I want to transform the data to create average of price between dyads of two countries. Something like this:

Ctr_1 Ctr_2 Avg_Price
US    UK    250
US    FI    300
US    CN    500
US    IT    500
UK    FI    300
UK    US    250
CN    IT    500
CN    US    500
CN    GR    500
IT    CN    500
IT    US    500
IT    GR    500
GR    CN    500
GR    IT    500
GR    US    500

I tried changing the data to long form.

library(data.table)

setDT(df)

df1 <- df[, .(country = unlist(strsplit(country, "; "))), by = .(product_id)]

But didn't know how to proceed from here. Any help would be really appreciated. In fact, there is a year variable as well, and the idea is to aggregate pair-wise per year to create a panel dataset.


Solution

  • df1 <- df[, .(country = strsplit(country, '; ')[[1]]), by = .(product_id, price)]
    
    # join product_id and price of c1 (CJ for cross-join)
    df2 <- df1[CJ(country, c2 = country),
               on = .(country), allow.cartesian = TRUE][country < c2]  # keep uniq pairs
    
    # join product_id and price of c2, then get average
    res <- df1[df2, on = .(country = c2, product_id), nomatch = 0][
      , .(avg_price = mean(price)), by = .(c1 = country, c2 = i.country)]
    
    res
    #    c1 c2 avg_price
    # 1: GR CN       500
    # 2: IT CN       500
    # 3: US CN       500
    # 4: UK FI       300
    # 5: US FI       300
    # 6: IT GR       500
    # 7: US GR       500
    # 8: US IT       500
    # 9: US UK       250