nlpsyntaxnet

How to use syntaxnet output


I started playing with Syntaxnet two days ago and I'm wondering how to use/export the output (ascii tree or conll ) in a format that is easy to parse (ie : Json, XML, python graph).

Thanks for your help !


Solution

  • Before going to ascii tree(I think you are following demo.sh), the input goes through tagging and parsing. Remove the last step in the command pipeline.

    Your modified demo.sh file will look like this :-

    PARSER_EVAL=bazel-bin/syntaxnet/parser_eval
    MODEL_DIR=syntaxnet/models/parsey_mcparseface
    [[ "$1" == "--conll" ]] && INPUT_FORMAT=stdin-conll || INPUT_FORMAT=stdin
    
    $PARSER_EVAL \
      --input=$INPUT_FORMAT \
      --output=stdout-conll \
      --hidden_layer_sizes=64 \
      --arg_prefix=brain_tagger \
      --graph_builder=structured \
      --task_context=$MODEL_DIR/context.pbtxt \
      --model_path=$MODEL_DIR/tagger-params \
      --slim_model \
      --batch_size=1024 \
      --alsologtostderr \
       | \
      $PARSER_EVAL \
      --input=stdin-conll \
      --output=stdout-conll \
      --hidden_layer_sizes=512,512 \
      --arg_prefix=brain_parser \
      --graph_builder=structured \
      --task_context=$MODEL_DIR/context.pbtxt \
      --model_path=$MODEL_DIR/parser-params \
      --slim_model \
      --batch_size=1024 \
      --alsologtostderr \
    

    You can then run:-

    $ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh 1>sample.txt 2>dev/null
    

    You result will be stored in sample.txt and it looks like this :-

    1   Bob _   NOUN    NNP _   2   nsubj   _   _
    2   brought _   VERB    VBD _   0   ROOT    _   _
    3   the _   DET DT  _   4   det _   _
    4   pizza   _   NOUN    NN  _   2   dobj    _   _
    5   to  _   ADP IN  _   2   prep    _   _
    6   Alice   _   NOUN    NNP _   5   pobj    _   _
    7   .   _   .   .   _   2   punct   _   _
    

    From, here you can easily get information about head of each word, parts of speech and type of node by splitting data with \n

    The ascii tree by itself is build by using above.