linuxfiledesign-patternsclassificationlibmagic

How do you write a magic file test pattern to match the end of a file?


I am beginning to wonder if this is even possible as multiple searches on SO, Google, Bing and linuxquestions.org have turned up nothing.

I am interested in extending the magic patterns located in /usr/share/magic (used by the file(1) utility) to recognize files based on data at or near the end of the file. I have been able to do this for the beginning of a file, as well as for arbitrary offsets into the file from the beginning.

The man page does a pretty good job of illustrating some standard usage cases; unfortunately, it does not seem like there is a way to index from the end as opposed to the beginning. The only workaround I could come up with was to adopt a scripted approach using tac and/or lreverse but feel these may be unfriendly to binary data.

Also, I wanted to avoid any other scripted processing - I feel like this should be doable with the right file magic. Any ideas?


Solution

  • It's not possible. file(1) is designed to work with pipes too. You can not use lseek(2) on pipes to get to the end of the file. Reading the whole file until the end would be very slow (and file(1) tries hard to be fast) and if it is actually reading from a pipe, it may never encounter the end of the file, which would be even worse.

    As for the documentation, in case of open source software, the source code itself is the ultimate documentation. If you get stuck in a case like this, it is always a good idea to have a look. The function file_or_fd() in src/magic.c gives the clue. Use the Source, Luke! ;-)

    In your specific case, I would have a second look at the file format in question, and if it really can not be parsed by file(1), then a short Perl or Python script should do the trick. Good luck!