pythonaudioaudio-processingmutagenaudioformat

Using Mutagen to process all accepted file types


What do I need to do in order to process every file type accepted by mutagen, .ogg, .apev2, .wma, flac, mp4, and asf? (I excluded mp3 because it has the most documentation on it)

I'd appreciated if someone who know how this is done could provide some pseudo-code in order to explain the techniques used. The main tags that I'd want extracted are the title, and artist of the files, album if available.

Where to start?


Solution

  • Each tag type has different names for the fields, and they don't all map perfectly.

    If you just want a handful of the most important fields, Mutagen has "easy" wrappers for ID3v2 and MP4/ITMF. So, for example, you can do this:

    >>> m = mutagen.File(path, easy=True)
    >>> m['title']
    [u'Sunshine Smile']
    >>> m['artist']
    [u'Adorable']
    >>> m['album']
    [u'Against Perfection']
    

    But this will only work for these two file formats. Vorbis, Metaflac, APEv2, and WMT tags are essentially free-form key: value or key: [list of values] mappings. Vorbis does have a recommended set of names for common comment fields, and WM has a set of fields that are mapped by the WMP GUI and the .NET API, but Metaflac and APEv2 don't even have that. In fact, it's pretty common to see both "Artist", from the old ID3v1 field name, and "ARTIST", from Vorbis, in Metaflac comments.

    And even for ID3v2, the mappings aren't perfect—iTunes shows the "TPE1" frame as "Artist" and "TPE2" as "Album Artist", while Foobar2000 shows TPE2 as "Artist" and TXXX:ALBUM ARTIST as "Album Artist".

    So, to do this right, you have to look at the iTMF, Vorbiscomment, ID3v2 (or see Wikipedia), and WMT, and then look at the files you have and add some heuristics to decide how to get what you want from the files you have.

    For example, you might try something like this:

    >>> m = mutagen.File(path)
    >>> for tag in ('TPE1', 'TPE2', u'©ART', 'Author', 'Artist', 'ARTIST',
    ...             'TRACK ARTIST', 'TRACKARTIST', 'TrackArtist', 'Track Artist'):
    ...     try:
    ...         artist = unicode(m[tag][0])
    ...         break
    ...     except KeyError:
    ...         pass
    

    A better solution would be to switch on the tag type and only try the appropriate fields for the format.

    Fortunately, other people have done this work for you. You can find almost all the information people have gathered about how different players/taggers map values to each format at the Hydrogen Audio forums and wiki, and various other projects have turned that information into simple tag-mapping tables that you can just pick up and borrow for your code, like this one from MusicBrainz. MusicBrainz Picard even has a wrapper around Mutagen that lets you use a consistent set of metadata names (the ones described here) with all tag types.