tagsnlpstanford-nlpnamed-entity-recognitionnamed-entity-extraction

Tagging and Training NER dataset


I have a data set and I want to tag it for Named Entity Recognition. My dataset is in Persian. I want to know how should I tag expressions like :

*** آقای مهدی کاظمی = Mr Mehdi Kazemi / Mr will Smith. >>> (names with titles) should I tag all as a person or just the first name and last name should be tagged? (I mean should i also tag "Mr")

Mr >> b_per || Mr >> O

Mehdi >> i_per || Mehdi >> b_per

Kazemi >> i_per || Kazemi >> i_per

*** بیمارستان نور = Noor hospital >>> Should I tag the name only or the name and hospital both as Named Entity?

*** Eiffel tower / The Ministry of Defense (I mean the us DOD for example) >>> in Persian it is called : وزارت دفاع (vezarate defa) should I only tag Defense ? or all together?

There are many more examples for schools, movies, cities, countries and.... since we use the entity class before the named entity.

I would appreciate if you can help me with tagging this dataset.


Solution

  • I'll give you some examples from the CoNLL 2003 training data:

    "Mr." is not tagged as part of the person, so titles are ignored.

    "Columbia Presbyterian Hospital" is tagged as (LOC, LOC, LOC)

    "a New York hospital" (O, LOC, LOC, O)

    "Ministry of Commerce" is (ORG, ORG, ORG)

    I think "Eiffel Tower" should be (LOC, LOC)