wordnetn-triples

how to find lexicographer id into WorNet's nt file without library


I'm trying to link VerbNet with WordNet using the files they provide to work directly with data:

VerbNet => http://verbs.colorado.edu/verb-index/vn/verbnet-3.3.tar.gz

WordNet => http://wordnet-rdf.princeton.edu/static/wordnet.nt.gz

The verbs in VerbNet have a link to WordNet through their sense_key:

e.g. live%2:31:00::

This would be the structure of sense_key:

(lemma)%(part_of_speech_number):(lexical_file_number):(lexicographer_id)::

Parsing the n-triples of the nt file, I have found all the data except the lexicographer_id:

lemma => live 
part_of_speech_number => 2 
lexical_file_number => 31
lexicographer_id => ??

Solution

  • Parsing the wordnet.nt file doesn't seem to give you this information.

    If Wordnet 3.1 database is downloaded from http://wordnetcode.princeton.edu/wn3.1.dict.tar.gz (link in https://wordnet.princeton.edu/download/current-version), there you'll find the file "index.sense" which contains entries like these:

    bethel%1:06:00:: 02836245 1 0
    bethink%2:31:00:: 00685046 2 1
    bethink%2:39:00:: 02171205 1 3
    bethlehem%1:15:00:: 08813084 2 0
    

    The current description of this structure is on https://wordnet.princeton.edu/documentation/senseidx5wn

    The first parameter in the line is the sense_key which is used in VerbNet. The second parameter is the synset_offset which coincides with the Synset Identifier in the file wordnet.nt.

    From the file "index.sense" you can get also the sense number to match against the structure "word.pos.sense_number", like in: "man.n.02"