pythonunicodecodepagesmbcs

Filename formatting in Python under Windows


I have two distincts files called:

'╠.txt' and '¦.txt'

Such simple code:

files = os.listdir('E:\pub\private\desktop\')
for f in files:
    print f, repr(f), type (f)

which would return

¦.txt '\xa6.txt' <type 'str'>
¦.txt '\xa6.txt' <type 'str'>

I don't get why I am getting the code 0xA6 for the ╠ character instead of OxCC. I have been trying to play arround with the encode-decode methode but without success. I have noticed that sys.getfilesystemencoding() is set mbcs - but I can't manage to change it something like cp437.

Any help is very much appreciated. Thanks!


Solution

  • You have to pass a unicode string to os.listdir and Python will return unicode filenames:

    # a string that is unicode+raw (escapes \)
    path = ur"E:\pub\private\desktop"
    print os.listdir(path)
    # [u'\xa6.txt', u'\u2560.txt']
    

    Windows NT actually uses unicode for filenames, but I guess Python tries to encode them when you pass a encoded path name.