pythonnlpkeyword-extraction

python key phrase extraction using pke module


I was trying to extract key phrases using https://github.com/boudinfl/pke module. When I run it once it is perfectly working. But when I am running it for several times it emits following error. ZeroDivisionError: float division by zero

my code is as follows.

extractor = TopicRank()
def key_phrase_extract(path_to_json):
   //get_temp_text.txt from json
   extractor.load_document(input='temp_text.txt', language="en", max_length=10000000,
                        normalization='stemming')
   extractor.candidate_selection(pos={'NOUN', 'PROPN', 'ADJ'},stoplist=stoplist)
   extractor.candidate_weighting(threshold=0.74,
                                  method='average')
   kpe_results = []
   for (keyphrase, score) in extractor.get_n_best(n=10, stemming=True):
      kpe_results.append([keyphrase, score])
   print(kpe_results)

for each_json in json_list()
    key_phrase_extract('each_json')

It perfectly runs for the first json file but when starting the second one it gives me the ZeroDivisionError: float division by zero


Solution

  • I was able to fix the issue. The problem was initializing the extractor outside the function.

    def key_phrase_extract(path_to_json):
       extractor = TopicRank()
       //get_temp_text.txt from json
       extractor.load_document(input='temp_text.txt', language="en", max_length=10000000,
                            normalization='stemming')
       extractor.candidate_selection(pos={'NOUN', 'PROPN', 'ADJ'},stoplist=stoplist)
       extractor.candidate_weighting(threshold=0.74,
                                      method='average')
       kpe_results = []
       for (keyphrase, score) in extractor.get_n_best(n=10, stemming=True):
          kpe_results.append([keyphrase, score])
       print(kpe_results)
    
    for each_json in json_list()
        key_phrase_extract('each_json')