I have a dataframe consisting of two columns: (1) Turkish cities, (2) corresponding values.
dict_ = {'City': {0: 'ADANA',
1: 'ANKARA',
2: 'ANTALYA',
3: 'AYDIN',
4: 'BALIKESİR',
5: 'BURSA',
6: 'DENİZLİ',
7: 'DÜZCE',
8: 'DİYARBAKIR',
9: 'ELAZIĞ',
10: 'GAZİANTEP',
11: 'GİRESUN',
12: 'HATAY',
13: 'KAHRAMANMARAŞ',
14: 'KARABÜK',
15: 'KARS',
16: 'KAYSERİ',
17: 'KIRIKKALE',
18: 'KIRKLARELİ',
19: 'KIRŞEHİR',
20: 'KOCAELİ',
21: 'KONYA',
22: 'KÜTAHYA',
23: 'MANİSA',
24: 'MARDİN',
25: 'MERSİN',
26: 'MUĞLA',
27: 'ORDU',
28: 'OSMANİYE',
29: 'SAKARYA',
30: 'SAMSUN',
31: 'TRABZON',
32: 'UŞAK',
33: 'YALOVA',
34: 'ZONGULDAK',
35: 'ÇORUM',
36: 'İSTANBUL',
37: 'İZMİR'},
'Value': {0: 15,
1: 25,
2: 19,
3: 2,
4: 6,
5: 5,
6: 3,
7: 1,
8: 1,
9: 1,
10: 7,
11: 2,
12: 31,
13: 5,
14: 1,
15: 1,
16: 4,
17: 5,
18: 1,
19: 1,
20: 6,
21: 4,
22: 2,
23: 1,
24: 1,
25: 5,
26: 5,
27: 4,
28: 3,
29: 2,
30: 3,
31: 2,
32: 2,
33: 1,
34: 2,
35: 2,
36: 221,
37: 6}}
data = pd.DataFrame(dict_)
When I try to capitalize the City
column (where the first letter is uppercase and the rest is lowercase), I am having a weird character issue.
data['İl'].apply(str.capitalize)
Lowercase version of "İ" changes to a character when I cannot identify, for examples:
or
import unicodedata
unicodedata.name("i̇")
# TypeError: name() argument 1 must be a unicode character, not str
I tried many solutions but to no avail!
def turkish_title_case(text):
turkish_correction = {"İ": "i", "I": "ı", "Ç": "ç", "Ğ": "ğ", "Ü": "ü", "Ş": "ş", "Ö": "ö"}
for turkish, corrected in turkish_correction.items():
text = text.replace(turkish, corrected)
text = text.capitalize()
turkish_correction = {"I": "İ"}
for turkish, corrected in turkish_correction.items():
text = text.replace(turkish, corrected)
return text
Considering that the city names are fixed, this may work for this case.