pythonnetworkxgephinetwork-analysis

How can I extract the nodes of the giant component from edges list?


I want to extract the Giant component from a Gephi graph. I'm currently working on a graph too large for using Gephi's own giant component function, Gephi just freezes. So my problem now is that I want to extract only the nodes which are part in the giant component from my edges.csv file to be able to remove all nodes not included in the giant component, making the file smaller and manageable for Gephi.

I want to solve this using Python and I know there is a lib for python called networkx, can my problem be solved through networkx easy? My edges.csv is on the format:

source, target, weight
nodeA, nodeB, 1
nodeA, nodeC, 1
nodeA, nodeD, 1
nodeB, nodeA, 1
nodeD, nodeB, 1

Solution

  • You can read your graph in from a pandas DataFrame and use the connected_component_subgraphs function (see docs) to split the graph into connected components then and get the largest component from that.

    Example reading your graph and making a networkx graph

    edge_list_df = pd.read_csv('edges.csv')
    g =  nx.pandas_edgelist(edge_list_df,source='source',
                            target='target',edge_attr='weight')
    

    Example getting the connected components and the largest one

    component_subgraph_list = list(nx.connected_component_subgraphs(g))
    largest_component = max(component_subgraph_list,key=len)