Here is an input example:
['ARTA Travel Group', 'Arta | آرتا', 'ARTAS™ Practice Development', 'ArtBinder', 'Arte Arac Takip App', 'アート建築', 'Arte Brasil Bar & Grill', 'ArtPod Stage', 'Artpollo扫码', 'Artpollo阿波罗-价值最优的艺术品投资电商', '아트홀']
Like above list, I want to remove elements with CHINESE, KOREAN, JAPANESE, ARBIC.
And below is the expected output (english only):
['ARTA Travel Group', 'ARTAS™ Practice Development', 'ArtBinder', 'Arte Arac Takip App', 'Arte Brasil Bar & Grill', 'ArtPod Stage']
You can use regex
and search with unicode range. ™ belongs to Letterlike Symbols which ranges from 2100—214F
; you can either include them all or just pick the specific ones.
import re
s = ['ARTA Travel Group', 'Arta | آرتا', 'ARTAS™ Practice Development', 'ArtBinder', 'Arte Arac Takip App', 'アート建築', 'Arte Brasil Bar & Grill', 'ArtPod Stage', 'Artpollo扫码', 'Artpollo阿波罗-价值最优的艺术品投资电商', '아트홀']
result = [i for i in s if not re.findall("[^\u0000-\u05C0\u2100-\u214F]+",i)]
print (result)
['ARTA Travel Group', 'ARTAS™ Practice Development', 'ArtBinder', 'Arte Arac Takip App', 'Arte Brasil Bar & Grill', 'ArtPod Stage']