rigraphnetwork-analysis

“the largest strongly connected component (LSCC)” measure equivalent in igraph?


I am trying to replicate “the largest strongly connected component (LSCC)” measure which seems to be absent from the general network analysis R 'igraph' package codes. The best I could find was “largest_component” command from 'igraph' package, but this only gives me the list of bank names with no numbers I need. So I guess this is only sorting the firm names instead of giving me values I need like this red box (below) in the paper’s snapshot does.

How can I replicate this LSCC network analysis values using R program?

enter image description here

I tried to use the R codes like in this link

However, it did not give numerical values I need such as for degree, closeness, centrality, etc. types of network numerical values.

The code I tried is as follows (which does not give any numerical values I need). I am not saying "largest_component" is the only available approach here. I am simply saying I also tried this code because you wanted me to provide any code in my question although that wasn't giving me the answer:

> library(igraph)
> largest_component(data,mode = "strong")
IGRAPH cd5f7d1 DN-- 94 988 -- 
+ attr: name (v/c)
+ edges from cd5f7d1 (vertex names):
 [1] ABCB US Equity->COLB US Equity ABCB US Equity->FULT US Equity ABCB US Equity->GABC US Equity ABCB US Equity->INDB US Equity
 [5] ABCB US Equity->BOH US Equity  ABCB US Equity->HOPE US Equity ABCB US Equity->NTRS US Equity ABCB US Equity->ONB US Equity 
 [9] ABCB US Equity->PB US Equity   ABCB US Equity->RNST US Equity ABCB US Equity->STBA US Equity ABCB US Equity->SASR US Equity
[13] ABCB US Equity->SBCF US Equity ABCB US Equity->SBSI US Equity ABCB US Equity->TRMK US Equity ABCB US Equity->UBSI US Equity
[17] ABCB US Equity->WFC US Equity  ABCB US Equity->SRCE US Equity ABCB US Equity->FFBC US Equity ABCB US Equity->PFBC US Equity
[21] ABCB US Equity->CADE US Equity C US Equity   ->CFR US Equity  C US Equity   ->EEFT US Equity C US Equity   ->NBTB US Equity
[25] C US Equity   ->OCFC US Equity C US Equity   ->PNFP US Equity C US Equity   ->PB US Equity   C US Equity   ->STBA US Equity
[29] C US Equity   ->SASR US Equity C US Equity   ->SBSI US Equity C US Equity   ->WABC US Equity C US Equity   ->BANR US Equity
+ ... omitted several edges
> components(gD_12_2022,mode = "strong")

And the snippet of my data is as follows:

 Date        i               j
09/2005 ABCB US Equity  CHCO US Equity
09/2005 ABCB US Equity  CHCO US Equity
09/2005 ABCB US Equity  CHCO US Equity
09/2005 ABCB US Equity  COLB US Equity
09/2005 ABCB US Equity  COLB US Equity
09/2005 ABCB US Equity  COLB US Equity
09/2005 ABCB US Equity  FITB US Equity
09/2005 ABCB US Equity  FNB US Equity
09/2005 ABCB US Equity  HBAN US Equity
09/2005 ABCB US Equity  HBAN US Equity
09/2005 ABCB US Equity  BOH US Equity
09/2005 ABCB US Equity  BOH US Equity
09/2005 ABCB US Equity  BOH US Equity
09/2005 ABCB US Equity  MTB US Equity
09/2005 ABCB US Equity  PNFP US Equity
09/2005 ABCB US Equity  SYBT US Equity
09/2005 ABCB US Equity  SYBT US Equity
09/2005 ABCB US Equity  SYBT US Equity
09/2005 ABCB US Equity  SIVBQ US Equity
09/2005 ABCB US Equity  SIVBQ US Equity
09/2005 ABCB US Equity  SIVBQ US Equity
09/2005 ABCB US Equity  TRMK US Equity
09/2005 ABCB US Equity  WFC US Equity
09/2005 ABCB US Equity  ZION US Equity
09/2005 ABCB US Equity  ZION US Equity
09/2005 ABCB US Equity  ZION US Equity
09/2005 ABCB US Equity  BRKL US Equity
09/2005 ABCB US Equity  BRKL US Equity
09/2005 ABCB US Equity  BRKL US Equity
09/2005 ABCB US Equity  CFFN US Equity

For other network measures like degree, centrality, etc., as I already mentioned, I have no problem producing them as they work as below (showing a snippet of my outputs). It is only the LSCC that I don't find any suitable code to run or execute from R:

Firms           Date    Degree       Closeness  Betweenness Clustering  Eigenvector
ABCB.US.Equity  09_2005 0.494623656 0.540697674 0.00484853  0.391304348 0.128802216
CHCO.US.Equity  09_2005 0.784946237 0.540697674 0.009498633 0.386809269 0.496275598
COLB.US.Equity  09_2005 0.612903226 0.550295858 0.007786027 0.386243386 0.265205502
FITB.US.Equity  09_2005 0.838709677 0.611842105 0.015768372 0.374307863 0.257392456
FNB.US.Equity   09_2005 0.494623656 0.436619718 0.002341893 0.383333333 0.47983247
HBAN.US.Equity  09_2005 0.64516129  0.510989011 0.008919628 0.342245989 0.507258004
BOH.US.Equity   09_2005 0.462365591 0.502702703 0.003176559 0.418300654 0.230633224
MTB.US.Equity   09_2005 0.731182796 0.513812155 0.011603933 0.346031746 0.560058904
PNFP.US.Equity  09_2005 0.731182796 0.502702703 0.006913042 0.406722689 0.668115579
SYBT.US.Equity  09_2005 1.494623656 0.611842105 0.041680331 0.352685051 0.70931023

Solution

  • The data snippet provided in the question only contains outgoing connections from a single bank, so isn't adequate to demonstrate a solution to the problem. Here is a reproducible toy example that should suffice:

    library(igraph)
    
    set.seed(1)
    
    d <- replicate(2, sample(paste("Bank", LETTERS[1:10]), 10, TRUE)) |>
      as.data.frame()
    
    d <- unique(d[d[[1]] != d[[2]], ])
    data <- graph_from_data_frame(d)
    
    plot(data)
    

    enter image description here

    From my reading of your methodology paper, LSCC is the proportion of other banks in the data set that can be reached from each bank, following only outgoing edges. We can find this is igraph using the function subcomponent. For example, to find all the banks we can reach from Bank B, we can do:

    subcomponent(data, V(data)["Bank B"], "out")
    #> + 5/9 vertices, named, from 4b4f304:
    #> [1] Bank B Bank G Bank E Bank I Bank F
    

    You can confirm that banks G, E, I and F are all reachable from bank B in the above example.

    We are interested in obtaining the proportion of all the banks that bank B can reach (excluding itself). That would simply be the number of nodes in the subcomponent (minus bank B) divided by the total number of banks (minus bank B). In other words:

    (length(subcomponent(data, V(data)["Bank B"], "out")) - 1) / (length(V(data)) - 1)
    #> [1] 0.5
    

    This number means that half of the other banks can be reached from bank B.

    To get results for all the banks, we can use lapply:

    result <- lapply(V(data), function(v) {
       (length(subcomponent(data, v, "out")) - 1) / (length(V(data)) - 1)
      })
    
    result
    #> $`Bank I`
    #> [1] 0.125
    #> 
    #> $`Bank D`
    #> [1] 0.125
    #> 
    #> $`Bank G`
    #> [1] 0.375
    #> 
    #> $`Bank A`
    #> [1] 0.375
    #> 
    #> $`Bank B`
    #> [1] 0.5
    #> 
    #> $`Bank C`
    #> [1] 0.25
    #> 
    #> $`Bank E`
    #> [1] 0.125
    #> 
    #> $`Bank J`
    #> [1] 0
    #> 
    #> $`Bank F`
    #> [1] 0
    

    And if you want this in a data frame, you can do:

    result |>
      as.data.frame(check.names = FALSE) |>
      t() |>
      as.data.frame() |>
      tibble::rownames_to_column() |>
      setNames(c('Bank', "LSCC")) |>
      dplyr::arrange(Bank)
    #>     Bank  LSCC
    #> 1 Bank A 0.375
    #> 2 Bank B 0.500
    #> 3 Bank C 0.250
    #> 4 Bank D 0.125
    #> 5 Bank E 0.125
    #> 6 Bank F 0.000
    #> 7 Bank G 0.375
    #> 8 Bank I 0.125
    #> 9 Bank J 0.000