pythonnetworkxspydergml

how to keep specific node in networkx


I'm currently studying my college special topic. My problem is i can remove all the node i don't want but i want to keep some specific node. Here's how i do it.

1.read gml into networkx

2.using this code to remove the website that i don't want and then write it into a new gml file

import networkx as nx
G = nx.read_gml('test.gml')
for i in range(2000):
    for node in G.nodes:
        if "pu.edu.tw" not in node:
            G.remove_node(node)
            break
nx.write_gml(G,"finaltest.gml")

3.As you can see the part of this gml file, i successfully keep all 'pu.edu.tw' website

graph [
directed 1
multigraph 1
node [
  id 0
  label "https://www.pu.edu.tw/"
]
node [
  id 1
  label "https://alumni.pu.edu.tw/"
]
node [
  id 2
  label "https://freshman.pu.edu.tw/"
]
node [
  id 3
  label "https://tdc.pu.edu.tw/"
]
node [
  id 4
  label "https://alcat.pu.edu.tw/"
]
node [
  id 5
  label "https://www.secretary.pu.edu.tw/"
]
node [
  id 6
  label "https://pugive.pu.edu.tw/"
]

4.The problem is when i try to draw this gml file with networkx, i got some nodes without egdes enter image description here

5.And i found out the reason is that I deleted the link related to 'pu.edu.tw' so there are some egdes missing

I want to know how to not only remove the website i don't want and keep specific node that related to 'pu.edu.tw' so that edges won't missing. or some way to reconnect node. Thank you.

---------------------------------------------------------------------------------

update a new question .... What if i want to add multiple condition, such as

def cleanup(g):
    g_aux = g.to_undirected()
        for node in g_aux.nodes:
            if ("tku.edu.tw"or"scu.edu.tw"or"cycu.edu.tw"or"fcu.edu.tw") not in node:
            for neighbor in g_aux.neighbors(node):
                if "tku.edu.tw"or"scu.edu.tw"or"cycu.edu.tw"or"fcu.edu.tw" in neighbor:
                    break
            else:
                g.remove_node(node)

is this the right way to do?


Solution

  • One thing you can do is to keep every node whose neighbor has "pu.edu.tw" in it's name.

    Here's the full code:

    import networkx as nx
    
    def cleanup(g):
        g_aux = g.to_undirected()
        for node in g_aux.nodes:
            if "pu.edu.tw" not in node:
                for neighbor in g_aux.neighbors(node):
                    if "pu.edu.tw" in neighbor:
                        # Found
                        break
                else:
                    # Didn't find pu.edu.tw in any neighbors
                    g.remove_node(node)
    
    G = nx.read_gml('test.gml')
    cleanup(G)
    nx.write_gml(G,"finaltest.gml")
    

    The result obtained is every node with "pu.edu.tw" and it's neighbors.
    Please note that I used an undirected version of the graph, g_aux = g.to_undirected(), keeping every neighbor of a "pu.edu.tw" independently of the direction of the connecting edge.

    Here is some code to check if any pu.edu.tw doesn’t have any neighbors:

    def check_isolated(g):
        for node in g.nodes:
            if "pu.edu.tw" in node:
                if g.degree[node] == 0:
                    print(node)
    

    If this outputs anything before running cleanup then those nodes will always be isolated.

    print(“before”)
    check_isolated(g)
    print(“cleaning...”)
    cleanup(g)
    print(“after”)
    check_isolated(g)