javaalgorithmfile-type

How to reliably detect file types?


Objective: given the file, determine whether it is of a given type (XML, JSON, Properties etc)

Consider the case of XML - Up until we ran into this issue, the following sample approach worked fine:

    try {
        saxReader.read(f);
    } catch (DocumentException e) {
        logger.warn("  - File is not XML: " + e.getMessage());
        return false;
    }
    return true;

As expected, when XML is well formed, the test would pass and method would return true. If something bad happens and file can't be parsed, false will be returned.

This breaks however when we deal with a malformed XML (still XML though) file.

I'd rather not rely on .xml extension (fails all the time), looking for <?xml version="1.0" encoding="UTF-8"?> string inside the file etc.

Is there another way this can be handled?

What would you have to see inside the file to "suspect it may be XML though DocumentException was caught". This is needed for parsing purposes.


Solution

  • File type detection tools: