python-3.xunicodepython-unicodeutfunicode-literals

'ی' and 'ک' are not searchable in main memory


I have a data set on main memory. It contains a set of Persian sentences. When I search in my memory I get good result, But when I put ی or ک in my keyword, I don't get a search result.

my search func:

UPDATE:

def word_lookup(self,word,ayas):
    pos = []
    return_value = []


try:

    for aya in ayas:

        self.aya_list = aya[3].split()
        word_cnt = 0
        pos = []
        for aya_ in self.aya_list:
            if word in aya_:
                pos.append(word_cnt)
                return_value.append([aya[0],aya[1],aya[2],pos])
            word_cnt += 1
except Exception as e:
    print(e)
return return_value
calling functions
word_lookup("my unicode keyword",  a set of ayas)

How can I solve it?

I use python3.


Solution

  • You can do the following patch:

    if re.match(r'\b\w*ی.*', word):
          word = re.sub(r'ی',r'ﻱ', word)
    
      if re.match(r'\b\w*ک.*', word):
          word = re.sub(r'ک',r'ﻙ', word)