I'm trying to get spreadsheet data from zipped .xlsx files. I'm using rubyzip
to access the contents of the zipfile
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
*process entry*
end
end
My problem is that rubyzip gives a Zip::Entry
object, which, I cant get to work with gems like roo
or creek
.
I've done something similar, but with .csv file. This was as simple as CSV.parse(entry.get_input_stream.read)
. However, that just gives me a string of encoded gibberish when using it on an .xlsx file.
I've looked around and the closest answer I got was temporarily extracting the files, but I want to avoid doing this since the files can get pretty large.
Does anyone have any suggestions? Thanks in advance.
So what you need to do is convert the stream into an IO
object that Roo
can understand.
To determine if the object passed to Roo::Spreadsheet.open
is a "stream" Roo
uses the following method:
def is_stream?(filename_or_stream)
filename_or_stream.respond_to?(:seek)
end
Since a Zip::InputStream
does not respond to seek
you cannot use this object directly. To get around this we simply need an object that does respond to seek
(like a StringIO
)
We can just read
the input stream into the StringIO
directly:
stream = StringIO.new(entry.get_input_stream.read)
Or the Zip
library also provides a method to copy a Zip::InputStream
to another IO
object through the IOExtras
module, which I think reads fairly nicely as well.
Knowing all of the above we can implement as follows:
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
# make sure Roo can handle the file (at least based on the extension)
ext = File.extname(entry.name)&.to_sym
next unless Roo::CLASS_FOR_EXTENSION[ext]
# stream = StringIO.new(entry.get_input_stream.read)
::Zip::IOExtras.copy_stream(stream = StringIO.new, entry.get_input_stream)
spreadsheet = Roo::Spreadsheet.open(stream, extension: ext)
# process file
end
end