pythonpandasstringcapitalizationcapitalize

Unknown character for Turkish character


I have a dataframe consisting of two columns: (1) Turkish cities, (2) corresponding values.

dict_ = {'City': {0: 'ADANA',
  1: 'ANKARA',
  2: 'ANTALYA',
  3: 'AYDIN',
  4: 'BALIKESİR',
  5: 'BURSA',
  6: 'DENİZLİ',
  7: 'DÜZCE',
  8: 'DİYARBAKIR',
  9: 'ELAZIĞ',
  10: 'GAZİANTEP',
  11: 'GİRESUN',
  12: 'HATAY',
  13: 'KAHRAMANMARAŞ',
  14: 'KARABÜK',
  15: 'KARS',
  16: 'KAYSERİ',
  17: 'KIRIKKALE',
  18: 'KIRKLARELİ',
  19: 'KIRŞEHİR',
  20: 'KOCAELİ',
  21: 'KONYA',
  22: 'KÜTAHYA',
  23: 'MANİSA',
  24: 'MARDİN',
  25: 'MERSİN',
  26: 'MUĞLA',
  27: 'ORDU',
  28: 'OSMANİYE',
  29: 'SAKARYA',
  30: 'SAMSUN',
  31: 'TRABZON',
  32: 'UŞAK',
  33: 'YALOVA',
  34: 'ZONGULDAK',
  35: 'ÇORUM',
  36: 'İSTANBUL',
  37: 'İZMİR'},
 'Value': {0: 15,
  1: 25,
  2: 19,
  3: 2,
  4: 6,
  5: 5,
  6: 3,
  7: 1,
  8: 1,
  9: 1,
  10: 7,
  11: 2,
  12: 31,
  13: 5,
  14: 1,
  15: 1,
  16: 4,
  17: 5,
  18: 1,
  19: 1,
  20: 6,
  21: 4,
  22: 2,
  23: 1,
  24: 1,
  25: 5,
  26: 5,
  27: 4,
  28: 3,
  29: 2,
  30: 3,
  31: 2,
  32: 2,
  33: 1,
  34: 2,
  35: 2,
  36: 221,
  37: 6}}

data = pd.DataFrame(dict_)

When I try to capitalize the City column (where the first letter is uppercase and the rest is lowercase), I am having a weird character issue.

data['İl'].apply(str.capitalize)

Lowercase version of "İ" changes to a character when I cannot identify, for examples:

enter image description here

or

enter image description here

import unicodedata
unicodedata.name("i̇")
# TypeError: name() argument 1 must be a unicode character, not str

I tried many solutions but to no avail!


Solution

  • def turkish_title_case(text):
        turkish_correction = {"İ": "i", "I": "ı", "Ç": "ç", "Ğ": "ğ", "Ü": "ü", "Ş": "ş", "Ö": "ö"}
    
        for turkish, corrected in turkish_correction.items():
            text = text.replace(turkish, corrected)
        text = text.capitalize()
    
        turkish_correction = {"I": "İ"}
        for turkish, corrected in turkish_correction.items():
            text = text.replace(turkish, corrected)
    
        return text
    

    Considering that the city names are fixed, this may work for this case.

    enter image description here