I'm currently studying my college special topic. My problem is i can remove all the node i don't want but i want to keep some specific node. Here's how i do it.
1.read gml into networkx
2.using this code to remove the website that i don't want and then write it into a new gml file
import networkx as nx
G = nx.read_gml('test.gml')
for i in range(2000):
for node in G.nodes:
if "pu.edu.tw" not in node:
G.remove_node(node)
break
nx.write_gml(G,"finaltest.gml")
3.As you can see the part of this gml file, i successfully keep all 'pu.edu.tw' website
graph [
directed 1
multigraph 1
node [
id 0
label "https://www.pu.edu.tw/"
]
node [
id 1
label "https://alumni.pu.edu.tw/"
]
node [
id 2
label "https://freshman.pu.edu.tw/"
]
node [
id 3
label "https://tdc.pu.edu.tw/"
]
node [
id 4
label "https://alcat.pu.edu.tw/"
]
node [
id 5
label "https://www.secretary.pu.edu.tw/"
]
node [
id 6
label "https://pugive.pu.edu.tw/"
]
4.The problem is when i try to draw this gml file with networkx, i got some nodes without egdes
5.And i found out the reason is that I deleted the link related to 'pu.edu.tw' so there are some egdes missing
I want to know how to not only remove the website i don't want and keep specific node that related to 'pu.edu.tw' so that edges won't missing. or some way to reconnect node. Thank you.
---------------------------------------------------------------------------------
update a new question .... What if i want to add multiple condition, such as
def cleanup(g):
g_aux = g.to_undirected()
for node in g_aux.nodes:
if ("tku.edu.tw"or"scu.edu.tw"or"cycu.edu.tw"or"fcu.edu.tw") not in node:
for neighbor in g_aux.neighbors(node):
if "tku.edu.tw"or"scu.edu.tw"or"cycu.edu.tw"or"fcu.edu.tw" in neighbor:
break
else:
g.remove_node(node)
is this the right way to do?
One thing you can do is to keep every node whose neighbor has "pu.edu.tw"
in it's name.
Here's the full code:
import networkx as nx
def cleanup(g):
g_aux = g.to_undirected()
for node in g_aux.nodes:
if "pu.edu.tw" not in node:
for neighbor in g_aux.neighbors(node):
if "pu.edu.tw" in neighbor:
# Found
break
else:
# Didn't find pu.edu.tw in any neighbors
g.remove_node(node)
G = nx.read_gml('test.gml')
cleanup(G)
nx.write_gml(G,"finaltest.gml")
The result obtained is every node with "pu.edu.tw"
and it's neighbors.
Please note that I used an undirected version of the graph, g_aux = g.to_undirected()
, keeping every neighbor of a "pu.edu.tw"
independently of the direction of the connecting edge.
Here is some code to check if any pu.edu.tw
doesn’t have any neighbors:
def check_isolated(g):
for node in g.nodes:
if "pu.edu.tw" in node:
if g.degree[node] == 0:
print(node)
If this outputs anything before running cleanup
then those nodes will always be isolated.
print(“before”)
check_isolated(g)
print(“cleaning...”)
cleanup(g)
print(“after”)
check_isolated(g)