rnetwork-programmingtidyverse

How to I remove networks with less than 10 nodes using the egor package?


I have a set of ego-centric networks which I am analyzing using the egor package. The egor object is basically a list of three types of data named: ego, alter and aatie.

The ego data has one row of data per egoID. The alter data has multiple rows per egoID. Essentially, I'd like the ego data to have a column which counts the number of rows that exist for each egoID in the alter dataset. The alter data also has a column called "nodeID" which starts at 1 and goes sequentially up to n number of rows per egoID.

I'd like to remove all networks which have fewer than 10 alters. I thought I would start by creating a new column in the ego dataset called 'netsize' which has the number of alters for that ego. I can then filter from there. For the first step, I've tried several types of code, but to give you an idea of the mess I've made, here's one example:

egor.obj$ego <- egor.obj$ego %>%
  mutate(netsize = map_dbl(egor.obj$alter, ~ max(.x$nodeID, na.rm = TRUE)))

I receive the following error:

Error in mutate(): ℹ In argument: netsize = map_dbl(egor.obj$alter, ~max(.x$nodeID, na.rm = TRUE)). Caused by error in map_dbl(): ℹ In index: 1. ℹ With name: .altID. Caused by error in .x$nodeID: ! $ operator is invalid for atomic vectors

I know egor can work with tidyverse but I'm quite new to both, and the egor object type is confusing me. What can I try next?


Solution

  • If I understand your data structure correctly, try something like this:

    egor.obj$ego = egor.obj$ego %>% 
        left_join(egor.obj$alter %>% 
                    count(egoID, name = "netsize"), 
                  by = "egoID")
    

    It counts the number of rows in alter per egoID returning a data frame with two columns egoID and netsize, then joins it with the ego data frame.

    Another option with max nodeID per egoID:

    egor.obj$ego = egor.obj$ego %>% 
      left_join(egor.obj$alter %>% 
                  summarise(netsize = max(nodeID, na.rm = T), .by = "egoID"),
                by = "egoID")
    

    Check this article on how to filter and subset egor datasets.

    This should work for your purpose (not tested):

    egor.new = subset(egor.obj, subset = egor.obj$ego$netsize >= 10, unit="ego")