[SOLVED] Graph learning in R, igraph, tidygraph

Graph learning in R, igraph, tidygraph

I have a graph with each node having a value (value in red).

I would like to do the following two things (I guess 1 is a special case of 2):

Each node should be assigned the mean of the value of the direct peers directing to it. For example node #5 (1+2)/2=1.5 or node #3 (0+2+0)/3=2/3.
Instead of direct neighbors, include all connected nodes but with a diffusion of times 1/n with n being the distance to the node. The further away the information is coming from the weaker signal we'd have.

I looked into functions of igraph, but could not find anything that is doing this (I might have overseen though). How could I do this computation?

Below is the code for a sample network with random values.

library(tidyverse)
library(tidygraph)
library(ggraph)

set.seed(6)
q <- tidygraph::play_erdos_renyi(6, p = 0.2) %>% 
  mutate(id = row_number(),
         value = sample(0:3, size = 6, replace = T))
q %>% 
  ggraph(layout = "with_fr") +
  geom_edge_link(arrow = arrow(length = unit(0.2, "inches"), 
                               type = "closed")) +
  geom_node_label(aes(label = id)) +
  geom_node_text(aes(label = value), color = "red", size = 7, 
                 nudge_x = 0.2, nudge_y = 0.2)

Edit, found a solution to 1

q %>% 
  mutate(value_smooth = map_local_dbl(order = 1, mindist = 1, mode = "in", 
                                      .f = function(neighborhood, ...) {
    mean(as_tibble(neighborhood, active = 'nodes')$value)
  }))

Edit 2, solution to 2, not the most elegant I guess

q %>% 
  mutate(value_smooth = map_local_dbl(order = 1, mindist = 0, mode = "in", 
                                      .f = function(neighborhood, node, ...) {
    ne <- neighborhood
    
    ne <- ne %>%
      mutate(d = node_distance_to(which(as_tibble(ne, 
                                                  active = "nodes")$id == node)))
    
    as_tibble(ne, active = 'nodes') %>% 
      filter(d != 0) %>% 
      mutate(helper = value/d) %>% 
      summarise(m = mean(value)) %>% 
      pull(m)
    }))

Edit 3, a faster alternative to map_local_dbl

map_local loops through all nodes of the graph. For large graphs, this takes very long. For just computing the means, this is not needed. A much faster alternative is to use the adjacency matrix and some matrix multiplication.

q_adj <- q %>% 
  igraph::as_adjacency_matrix()

# out
(q_adj %*% as_tibble(q)$value) / Matrix::rowSums(q_adj)

# in
(t(q_adj) %*% as_tibble(q)$value) / Matrix::colSums(q_adj)

The square of the adjacency matrix is the second order adjacency matrix, and so forth. So a solution to problem 2 could also be created.

Edit 4, direct weighted mean

Say the original graph has weights associated to each edge.

q <- q %>% 
  activate(edges) %>% 
  mutate(w = c(1,0.5,1,0.5,1,0.5,1)) %>% 
  activate(nodes)

We would like to compute the weighted mean of the direct peers' value.

q_adj_wgt <- q %>% 
  igraph::as_adjacency_matrix(attr = "w")

# out
(q_adj_wgt %*% as_tibble(q)$value) / Matrix::rowSums(q_adj_wgt)

# in
(t(q_adj_wgt) %*% as_tibble(q)$value) / Matrix::colSums(q_adj_wgt)

Solution

Probably you can try the code below

q %>%
    set_vertex_attr(
        name = "value",
        value = sapply(
            ego(., mode = "in", mindist = 1),
            function(x) mean(x$value)
        )
    )

which gives

# A tbl_graph: 6 nodes and 7 edges
#
# A directed simple graph with 1 component
#
# Node Data: 6 x 2 (active)
     id   value
  <int>   <dbl>
1     1   0.5
2     2 NaN
3     3   0.667
4     4 NaN
5     5   1.5
6     6 NaN
#
# Edge Data: 7 x 2
   from    to
  <int> <int>
1     3     1
2     6     1
3     1     3
# ... with 4 more rows