I'm creating a ruby script which goes through several zip files and validates the content of any xml files within. To optimise my script, I'm using the ruby-zip gem to open the zip files without extracting them.
My initial thought was to use filemagic to determine the MIME-type of the files, but the filemagic gem takes a file path and all I have are these Entry and InputStream classes which are unique to ruby-zip.
Is there a good way to determine the filetype without extracting? Ultimately I need to identify xml files, but I can get away with identifying plain-text files and using a regex to look for the
the filemagic gem takes a file path
The filemagic gem's file
method takes a file path, but file
isn't the only method it has. A glance at the docs reveals it has an io
method, too.
all I have are these Entry and InputStream classes which are unique to ruby-zip
I wouldn't say InputStream is "unique to ruby-zip." From the docs (emphasis mine):
A InputStream inherits IOExtras::AbstractInputStream in order to provide an IO-like interface for reading from a single zip entry
So FileMagic has an io
method and Zip::InputStream is IO-like. That leads us to a pretty straightforward solution:
require 'filemagic'
require 'zip'
Zip::InputStream.open('/path/to/file.zip') do |io|
entry = io.get_next_entry
FileMagic.open(:mime) do |fm|
p fm.io(entry.get_input_stream)
end
end