I am trying to replicate “the largest strongly connected component (LSCC)” measure which seems to be absent from the general network analysis R 'igraph' package codes. The best I could find was “largest_component” command from 'igraph' package, but this only gives me the list of bank names with no numbers I need. So I guess this is only sorting the firm names instead of giving me values I need like this red box (below) in the paper’s snapshot does.
How can I replicate this LSCC network analysis values using R program?
I tried to use the R codes like in this link
However, it did not give numerical values I need such as for degree, closeness, centrality, etc. types of network numerical values.
The code I tried is as follows (which does not give any numerical values I need). I am not saying "largest_component" is the only available approach here. I am simply saying I also tried this code because you wanted me to provide any code in my question although that wasn't giving me the answer:
> library(igraph)
> largest_component(data,mode = "strong")
IGRAPH cd5f7d1 DN-- 94 988 --
+ attr: name (v/c)
+ edges from cd5f7d1 (vertex names):
[1] ABCB US Equity->COLB US Equity ABCB US Equity->FULT US Equity ABCB US Equity->GABC US Equity ABCB US Equity->INDB US Equity
[5] ABCB US Equity->BOH US Equity ABCB US Equity->HOPE US Equity ABCB US Equity->NTRS US Equity ABCB US Equity->ONB US Equity
[9] ABCB US Equity->PB US Equity ABCB US Equity->RNST US Equity ABCB US Equity->STBA US Equity ABCB US Equity->SASR US Equity
[13] ABCB US Equity->SBCF US Equity ABCB US Equity->SBSI US Equity ABCB US Equity->TRMK US Equity ABCB US Equity->UBSI US Equity
[17] ABCB US Equity->WFC US Equity ABCB US Equity->SRCE US Equity ABCB US Equity->FFBC US Equity ABCB US Equity->PFBC US Equity
[21] ABCB US Equity->CADE US Equity C US Equity ->CFR US Equity C US Equity ->EEFT US Equity C US Equity ->NBTB US Equity
[25] C US Equity ->OCFC US Equity C US Equity ->PNFP US Equity C US Equity ->PB US Equity C US Equity ->STBA US Equity
[29] C US Equity ->SASR US Equity C US Equity ->SBSI US Equity C US Equity ->WABC US Equity C US Equity ->BANR US Equity
+ ... omitted several edges
> components(gD_12_2022,mode = "strong")
And the snippet of my data is as follows:
Date i j
09/2005 ABCB US Equity CHCO US Equity
09/2005 ABCB US Equity CHCO US Equity
09/2005 ABCB US Equity CHCO US Equity
09/2005 ABCB US Equity COLB US Equity
09/2005 ABCB US Equity COLB US Equity
09/2005 ABCB US Equity COLB US Equity
09/2005 ABCB US Equity FITB US Equity
09/2005 ABCB US Equity FNB US Equity
09/2005 ABCB US Equity HBAN US Equity
09/2005 ABCB US Equity HBAN US Equity
09/2005 ABCB US Equity BOH US Equity
09/2005 ABCB US Equity BOH US Equity
09/2005 ABCB US Equity BOH US Equity
09/2005 ABCB US Equity MTB US Equity
09/2005 ABCB US Equity PNFP US Equity
09/2005 ABCB US Equity SYBT US Equity
09/2005 ABCB US Equity SYBT US Equity
09/2005 ABCB US Equity SYBT US Equity
09/2005 ABCB US Equity SIVBQ US Equity
09/2005 ABCB US Equity SIVBQ US Equity
09/2005 ABCB US Equity SIVBQ US Equity
09/2005 ABCB US Equity TRMK US Equity
09/2005 ABCB US Equity WFC US Equity
09/2005 ABCB US Equity ZION US Equity
09/2005 ABCB US Equity ZION US Equity
09/2005 ABCB US Equity ZION US Equity
09/2005 ABCB US Equity BRKL US Equity
09/2005 ABCB US Equity BRKL US Equity
09/2005 ABCB US Equity BRKL US Equity
09/2005 ABCB US Equity CFFN US Equity
For other network measures like degree, centrality, etc., as I already mentioned, I have no problem producing them as they work as below (showing a snippet of my outputs). It is only the LSCC that I don't find any suitable code to run or execute from R:
Firms Date Degree Closeness Betweenness Clustering Eigenvector
ABCB.US.Equity 09_2005 0.494623656 0.540697674 0.00484853 0.391304348 0.128802216
CHCO.US.Equity 09_2005 0.784946237 0.540697674 0.009498633 0.386809269 0.496275598
COLB.US.Equity 09_2005 0.612903226 0.550295858 0.007786027 0.386243386 0.265205502
FITB.US.Equity 09_2005 0.838709677 0.611842105 0.015768372 0.374307863 0.257392456
FNB.US.Equity 09_2005 0.494623656 0.436619718 0.002341893 0.383333333 0.47983247
HBAN.US.Equity 09_2005 0.64516129 0.510989011 0.008919628 0.342245989 0.507258004
BOH.US.Equity 09_2005 0.462365591 0.502702703 0.003176559 0.418300654 0.230633224
MTB.US.Equity 09_2005 0.731182796 0.513812155 0.011603933 0.346031746 0.560058904
PNFP.US.Equity 09_2005 0.731182796 0.502702703 0.006913042 0.406722689 0.668115579
SYBT.US.Equity 09_2005 1.494623656 0.611842105 0.041680331 0.352685051 0.70931023
The data snippet provided in the question only contains outgoing connections from a single bank, so isn't adequate to demonstrate a solution to the problem. Here is a reproducible toy example that should suffice:
library(igraph)
set.seed(1)
d <- replicate(2, sample(paste("Bank", LETTERS[1:10]), 10, TRUE)) |>
as.data.frame()
d <- unique(d[d[[1]] != d[[2]], ])
data <- graph_from_data_frame(d)
plot(data)
From my reading of your methodology paper, LSCC is the proportion of other banks in the data set that can be reached from each bank, following only outgoing edges. We can find this is igraph
using the function subcomponent
. For example, to find all the banks we can reach from Bank B, we can do:
subcomponent(data, V(data)["Bank B"], "out")
#> + 5/9 vertices, named, from 4b4f304:
#> [1] Bank B Bank G Bank E Bank I Bank F
You can confirm that banks G, E, I and F are all reachable from bank B in the above example.
We are interested in obtaining the proportion of all the banks that bank B can reach (excluding itself). That would simply be the number of nodes in the subcomponent (minus bank B) divided by the total number of banks (minus bank B). In other words:
(length(subcomponent(data, V(data)["Bank B"], "out")) - 1) / (length(V(data)) - 1)
#> [1] 0.5
This number means that half of the other banks can be reached from bank B.
To get results for all the banks, we can use lapply
:
result <- lapply(V(data), function(v) {
(length(subcomponent(data, v, "out")) - 1) / (length(V(data)) - 1)
})
result
#> $`Bank I`
#> [1] 0.125
#>
#> $`Bank D`
#> [1] 0.125
#>
#> $`Bank G`
#> [1] 0.375
#>
#> $`Bank A`
#> [1] 0.375
#>
#> $`Bank B`
#> [1] 0.5
#>
#> $`Bank C`
#> [1] 0.25
#>
#> $`Bank E`
#> [1] 0.125
#>
#> $`Bank J`
#> [1] 0
#>
#> $`Bank F`
#> [1] 0
And if you want this in a data frame, you can do:
result |>
as.data.frame(check.names = FALSE) |>
t() |>
as.data.frame() |>
tibble::rownames_to_column() |>
setNames(c('Bank', "LSCC")) |>
dplyr::arrange(Bank)
#> Bank LSCC
#> 1 Bank A 0.375
#> 2 Bank B 0.500
#> 3 Bank C 0.250
#> 4 Bank D 0.125
#> 5 Bank E 0.125
#> 6 Bank F 0.000
#> 7 Bank G 0.375
#> 8 Bank I 0.125
#> 9 Bank J 0.000