pythonregexarabicpython-refarsi

re.sub change the direction of the replaced Persian/Arabic content


Here is my code:

import re

CUSTOMIZED_SUB_PATTERN = "\{{\{{(?:\s)*{tag_key}(?:\s)*\|(?:\s)*([^|}}]+)(?:\s)*\}}\}}"
pattern = re.compile(CUSTOMIZED_SUB_PATTERN.format(tag_key='name'))
title = "عزیز {{ name | default value 1}} سلام"
re.sub(pattern, "محمد", title)

The output:

'عزیز محمد سلام'

But what I want is:

'سلام محمد عزیز'

So as you can see the direction of the sentence has been changed over the replacement.

Question: How can I fix this issue?


Solution

  • You can use bidi and arabic_reshaper libraries in order to reshape and replace the RTL text accordingly.

    There is a special option in get_Display() method which is base_dir which has ‘L’ or ‘R’, override the calculated base_level.

    You may try:

    import re
    import arabic_reshaper
    from bidi.algorithm import get_display
    
    title = "عزیز {{ name | defalue value 1}} سلام"
    substr = "محمد"
    reshaped_text = arabic_reshaper.reshape(title) 
    new_title = get_display(reshaped_text, base_dir = 'L') # 'L' option indicates the text to appear from Left to Right. By default; it is RTL for Arabic texts.       
    reshaped_text2 = arabic_reshaper.reshape(substr)
    new_substr = get_display(reshaped_text2, base_dir = 'L')
    
    CUSTOMIZED_SUB_PATTERN = "\{{\{{(?:\s)*{tag_key}(?:\s)*\|(?:\s)*([^|}}]+)(?:\s)*\}}\}}"
    pattern = re.compile(CUSTOMIZED_SUB_PATTERN.format(tag_key='name'))
    print(re.sub(pattern, new_substr, new_title))
    

    You can find the sample run result of the above implementation in here.