pythonnlpnlu

Extract a path of dependency relations from the ROOT to a token? SPACY


Extract a path of dependency relations from the ROOT to a token? SPACY. The code I have it extract the whole path

import spacy

sentence = "I saw the man with a telescop"

nlp = spacy.load('en')
doc = nlp(sentence)

for sent in doc.sents:
    for token in sent:
        print("{}\t{}\t{}\t{}".format(token.i, token.text, token.head, token.dep_))

Solution

  • The dependency tree is basically a graph, so if you want to find the (shortest) path to ROOT, you need to use some graph-based libraries like networkx. Let's say you want to extract a path from a token "telescop" to the root. Then you could try to do something like this:

    import spacy
    import networkx
    
    sentence = "I saw the man with a telescop"
    
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(sentence)
    edges = []
    
    for sent in doc.sents:
        for token in sent:
            print("{}\t{}\t{}\t{}".format(token.i, token.text, token.head, token.dep_))
            if token.dep_ == "ROOT":
                target = token.text
            for child in token.children:
                edges.append(("{0}".format(token.lower_), "{0}".format(child.lower_)))
    
    
    graph = networkx.Graph(edges)
    print(nx.shortest_path(graph, source="telescop", target=target))
    

    Result:

    0   I   saw nsubj
    1   saw saw ROOT
    2   the man det
    3   man saw dobj
    4   with    saw prep
    5   a   telescop    det
    6   telescop    with    pobj
    ['telescop', 'with', 'saw']