pythonarabic

Removing Arabic Diacritics using Python


I want to filter my text by removing Arabic diacritics using Python.

For example:

Context Text
Before filtering اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا
After filtering اللهم اغفر لنا ولوالدينا

I have found that this can be done using CAMeL Tools but I am not sure how.


Solution

  • You can use the library pyArabic like this:

    import pyarabic.araby as araby
    
    before_filter="اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا"
    after_filter = araby.strip_diacritics(before_filter)
    
    print(after_filter)
    # will print : اللهم اغفر لنا ولوالدينا
    

    You can try different strip filters:

    araby.strip_harakat(before_filter)  # 'اللّهمّ اغفر لنا ولوالدينا'
    araby.strip_lastharaka(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'
    araby.strip_shadda(before_filter)  # 'اللَهمَ اغْفِرْ لنَا ولوالدِينَا'
    araby.strip_small(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'
    araby.strip_tashkeel(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'
    araby.strip_tatweel(before_filter)  # 'اللَّهمَّ اغْفِرْ لنَا ولوالدِينَا'