javafile-typeapache-tika

Apache Tika detect is returning inconsistent result


I am trying to find out the content type of a file using apache tika.

mean while i found this inconsistent behaviour.

final Tika tika = new Tika();

String fileType = tika.detect(uploadedInputStream);
System.out.println(fileType);
String newFileType = tika.detect(uploadedInputStream);
System.out.println(newFileType);

the above code is giving me out put as

application/pdf
application/octet-stream

I am expecting the output as application/pdf in both cases.

Can anyone explain why it is happening like this? how can I get the intended result?


Solution

  • When I wrapped InputStream in TikaInputStream as suggested in the comments, I could see that the problem is solved

        final Tika tika = new Tika();
        TikaInputStream tikaInputStream = TikaInputStream.get(uploadedInputStream);
        String fileType = tika.detect(tikaInputStream);
        System.out.println(fileType);
        final Tika newTika = new Tika();
        String newFileType = newTika.detect(tikaInputStream);
        System.out.println(newFileType);
    

    OutPut:

         application/pdf
         application/pdf