javaapachemime-typesapache-tika

Apache Tika parses AC3 file as application/octet-stream and not audio/ac3


Provide a AC3 audio file as input, fetch InputStream and pass it to Apache Tika.

While the library lists audio/ac3 in its mime types XML, it fails to identify the type. It works fine with the other standard media types.

Anyone know how to fix this.

Metadata metadata = new Metadata();
metadata.add(Metadata.RESOURCE_NAME_KEY, fileName);    
TikaConfig config = TikaConfig.getDefaultConfig();
MimeTypes mimeTypes = config.getMimeRepository();
tikaMediaType = mimeTypes.detect(new BufferedInputStream(inputStream), metadata);

Solution

  • You need to use a newer version of Apache Tika!

    Specifically, Apache Tika 2.0, or a nightly build / build from Github from 2017-12-24 or later.

    The mime detection magic for AC3 and EAC3 files was only recently added, via this commit to the project