pythonstringencodingutf-8latin

Converting utf-8 characters to scandic letters


I am struggling with trying to encode a string where scandic letters are in utf-8 format. For example, I would like to convert following string: test_string = "\xc3\xa4\xc3\xa4abc" Into the form of : test_string = "ääabc" The end goal is to send this string to Slack-channel via API. I did some testing, and figured out that Slack handles scandic letters properly. I have tried the following command: test_string= test_string.encode('latin1').decode('utf-8') but this does not change the string at all.

Same goes for the more brute-force method:

def simple_scand_convert(string):
   string = string.replace("\xc3\xa4", "ä")

Again, this does not change the string at all. Any tips or materials from where I could look for the solution?


Solution

  • I can't reproduce your reading the soup message from an incoming webhook code snippet; therefore, my answer is based on hard-coded data, and shows how Python specific text encodings raw_unicode_escape and unicode_escape work in detail:

    test_string = "\\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4"    # hard-coded
    print('test_string                  ', test_string)
    print('.encode("raw_unicode_escape")',
      test_string.encode( 'raw_unicode_escape'))
    print('.decode(    "unicode_escape")',
      test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape'))
    print('.encode("latin1").decode()   ', 
      test_string.encode( 'raw_unicode_escape').decode( 'unicode_escape').
                  encode( 'latin1').decode( 'utf-8'))
    

    Output: \SO\68069394.py

    test_string                   \xc3\xa5\xc3\xa4___åä
    .encode("raw_unicode_escape") b'\\xc3\\xa5\\xc3\\xa4___\xc3\xa5\xc3\xa4'
    .decode(    "unicode_escape") åä___åä
    .encode("latin1").decode()    åä___åä