rgeographic-distance

Restructuring data for geographic proximity analyses in R


I have a data set of people's geographic coordinates, which looks like this:

Person  Latitude    Longitude
  1     46.0614     -23.9386
  2     48.1792      63.1136
  3     59.9289      66.3883
  4     42.8167      58.3167
  5     43.1167      63.25

I am planning on calculating geographic proximity at the dyadic level, using the geosphere package in R. In order to accomplish that, I need to create a data set that looks like this:

Person1 Person2 LatitudeP1  LongitudeP1 LatitudeP2  LongitudeP2
   1       2     46.0614    -23.9386     48.1792     63.1136
   1       3     46.0614    -23.9386     59.9289     66.3883
   1       4     46.0614    -23.9386     42.8167     58.3167
   1       5     46.0614    -23.9386     43.1167     63.25
   2       3     48.1792     63.1136     59.9289     66.3883
   2       4     48.1792     63.1136     42.8167     58.3167
   2       5     48.1792     63.1136     43.1167     63.25
   3       4     59.9289     66.3883     42.8167     58.3167
   3       5     59.9289     66.3883     43.1167     63.25
   4       5     42.8167     58.3167     43.1167     63.25

Thus, the resulting data has a row for each possible dyad in the data set, and includes the coordinates of both individuals in the dyad. "LatitudeP1" and "LongitudeP1" are the coordinates for "Person1" in the dyad, and "LatitudeP2" and "LongitudeP2" are the coordinates for "Person2" in the dyad. Also, it doesn't matter which ID is listed as Person1 versus Person2, since geographic distance is not a directed relationship.


Solution

  • Just taking the possible combinations (combn) of Person 1 thru 5, and subsetting the Lat/long from your original data:

    dat <- read.table(header = TRUE, text="Person  Latitude    Longitude
    1     46.0614     -23.9386
    2     48.1792      63.1136
    3     59.9289      66.3883
    4     42.8167      58.3167
    5     43.1167      63.25")
    
    tmp <- t(combn(nrow(dat),2))
    
    #      [,1] [,2]
    # [1,]    1    2
    # [2,]    1    3
    # [3,]    1    4
    # [4,]    1    5
    # [5,]    2    3
    # [6,]    2    4
    # [7,]    2    5
    # [8,]    3    4
    # [9,]    3    5
    # [10,]    4    5
    
    res <- cbind(tmp,
                 do.call('cbind', lapply(1:2, function(x) 
                   mapply(`[`, dat[, 2:3], MoreArgs = list(i=tmp[, x])))))
    colnames(res) <- c('Person1','Person2','LatitudeP1','LongitudeP1',
                       'LatitudeP2','LongitudeP2')
    
    data.frame(res)
    
    #    Person1 Person2 LatitudeP1 LongitudeP1 LatitudeP2 LongitudeP2
    # 1        1       2    46.0614    -23.9386    48.1792     63.1136
    # 2        1       3    46.0614    -23.9386    59.9289     66.3883
    # 3        1       4    46.0614    -23.9386    42.8167     58.3167
    # 4        1       5    46.0614    -23.9386    43.1167     63.2500
    # 5        2       3    48.1792     63.1136    59.9289     66.3883
    # 6        2       4    48.1792     63.1136    42.8167     58.3167
    # 7        2       5    48.1792     63.1136    43.1167     63.2500
    # 8        3       4    59.9289     66.3883    42.8167     58.3167
    # 9        3       5    59.9289     66.3883    43.1167     63.2500
    # 10       4       5    42.8167     58.3167    43.1167     63.2500