pythonunicodesonosm3uwpl

Unicode Playlists for Sonos from Python


I'm working to export a small subset of music from my iTunes library to an external drive, for use with a Sonos speaker (via Media Library on Sonos). All was going fine until I came across some unicode text in track, album and artist names.

I'm moving from iTunes on Mac to a folder structure on Linux (Ubuntu), and the file paths all contain the original Unicode names and these are displayed and play fine from Sonos in the Artist / Album view. The only problem is playlists, which I'm generating via a bit of Python3 code.

Sonos does not appear to support UTF-8 encoding in .m3u / .m3u8 playlists. The character ÷ was interpreted by Sonos as ÷, which after a bit of Googling I found was clearly mixing up UTF-8 and UTF-16 (÷ 0xC3 0xB7 in UTF-8, whilst à is U+00C3 in UTF-16 and · is U+00B7 in UTF-16). I've tried many different ways of encoding it, and just can't get it to recognise tracks with non-standard (non-ASCII?) characters in their names.

I then tried .wpl playlists, and thought I'd solved it. Tracks with characters such as ÷ and • in their path now work perfectly, just using those characters in their unicode / UTF-8 form in the playlist file itself.

However, just as I was starting to tidy up and finish off the code, I found some other characters that weren't being handled correctly: ö, å, á and a couple of others. I've tried both using these as their original unicode characters, but also as their encoded XML identifier e.g. ́ Using this format doesn't make a difference to what works or does not work - ÷ (÷) and • (•) are fine, whilst ö (ö), å (å) and á (á) are not.

I've never really worked with unicode / UTF-8 before, but having read various guides and how-to's I feel like I'm getting close but probably just missing something simple. The fact that some unicode characters work now, and others don't, makes me think it's got to be something obvious! I'm guessing the difference is that accents modify the previous character, rather than being a character in itself, but tried removing the previous letter and that didn't work!

Within Python itself I'm not doing anything particularly clever. I read in the data from iTunes' XML file using:

    with open(settings['itunes_path'], 'rb') as itunes_handle:
        itunes_library = plistlib.load(itunes_handle)

For export I've tried dozens of different options, but generally something like the below (sometimes with encoding='utf-8' and various other options):

with open(dest_path, 'w') as playlist_file:
    playlist_file.write(generated_playlist)

Where generated_playlist is the result of extracting and filtering data from itunes_library, having run urllib.parse.unquote() on any iTunes XML data.

Any thoughts or tips on where to look would be very much appreciated! I'm hoping that to someone who understands Unicode better the answer will be really really obvious! Thanks!

Current version of the code available here: https://github.com/dwalker-uk/iTunesToSonos


Solution

  • With thanks to @lenz for the suggestions above, I do now have unicode playlists fully working with Sonos.

    A couple of critical points that should save someone else a lot of time:

    In Python 3, converting a path from iTunes XML format into something suitable for a .pls playlist on Sonos, needs the following key steps:

    left = len(itunes_library['Music Folder'])
    path_relative = 'Media/' + itunes_library['Tracks'][track_id]['Location'][left:]
    path_unquoted = urllib.parse.unquote(path_relative)
    path_norm = unicodedata.normalize('NFC', path_unquoted)
    path = path_norm.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace('"', '&quot;')
    
    playlist_wpl += '<media src="%s"/>\n' % path
    
    with open(pl_path, 'wb') as pl_file:
        pl_file.write(playlist_wpl.encode('ascii', 'xmlcharrefreplace'))
    

    A full working demo for exporting from iTunes for use in Sonos (or anything else) as .pls is available here: https://github.com/dwalker-uk/iTunesToSonos

    Hope that helps someone!