c++id3v2

Parsing ID3v2 from char* buffer using TagLib


I have a set of audio files (DSDIFF, specification) to which somebody appended an ID3v2 tag. This does not conform to the file's standard, and thus standard ID3v2 parsers (like TagLib) don't recognize the audio file and refuse to parse it. (Why doing non-standard stuff like this seemed like a good idea is beyond me.)

I can manually parse the file and extract the raw ID3 tag (as a char* + size); however, I'm not sure how to proceed from here and get the values of the individual frames inside the raw tag.

I would like to use TagLib to parse the char*, but I have never used the library before. I'm also okay with using other libraries. I would not like to write my own parser from scratch.

Here is what I have tried so far:

Attempt 1

auto file_name = std::filesystem::path("location/of/audio.dff");
auto file_stream = std::fstream(file_name);

// ... parsing the DSDIFF section of the file
// until I encounter the "ID3 " chunk.

auto id3_start = file_stream.tellg();
TagLib::FileRef taglib_file(file_name.string().cstr());

// when executing, taglib_file.isNull() evaluates to true
if (taglib_file.isNull()) 
    std::cerr << "TagLib can't read the file." << std::endl;

auto tag = TagLib::ID3v2::Tag(taglib_file.file(), id3_start);
// ... handle the tag

This approach doesn't work, because TagLib doesn't know how to parse the DSDIFF format. As a result, taglib_file is a NULL pointer and no tags are read.

Attempt 2

auto file_name = std::filesystem::path("location/of/audio.dff");
auto file_stream = std::fstream(file_name);

// ... parsing the DSDIFF section of the file
// until I encounter "ID3 ".
// read the size of the tag and store it in `size`

char* id3tag = new char[size];
file_stream.read(buff, size);

// How to parse the id3tag here?

paddy suggested using

auto tag = TagLib::ID3v2::Tag().parse(TagLib::ByteVector(buff, size));

unfortunately .parse is a protected method of Tag. I tried inheriting and creating a thin wrapper from_buffer that internally calls parse, but that didn't work either.

Suggestions are highly appreciated :)


I am aware if a similar question: Taglib read ID3v2 tags from arbitrary file c++

However, the answer there was "just use the specific parser for your file type". In my case, this parser does not exist, because the file type doesn't actually support ID3 tags; somebody just appended them anyway.


Solution

  • AmigoJack got me on the right track to figuring out a solution. While TagLib::File is an abstract class and can't be instantiated directly, one of the existing file-format specific parsers can be used. It is possible to create a file that only contains the ID3 tag and use the MPEG parser to read it.

    Here is the relevant code snippet:

    // ... parse the DSDIFF file and convert the data into FLAC format
    // until ID3 tag
    
    const struct {
        int length; // number of bytes in ID3 tag
        char* data; // filled while parsing the DSDIFF file
    } *id3_data;
    
    const auto data = TagLib::ByteVector(id3_data->data, id3_data->length);
    auto stream = TagLib::ByteVectorStream(data);
    auto file = TagLib::MPEG::File(&stream, TagLib::ID3v2::FrameFactory::instance());
    
    /* copy all supported tags to the FLAC file*/
    flac_tags.tag()->setProperties(file.tag()->properties());