pythonstreamzipon-the-fly

Python: How to zip a file on the fly while uploading it to SFTP?


How to upload a file to SFTP server while zipping it on the fly.

Or in other words, how to take a local file, zip it and upload it to SFTP server at the same time.


Solution

  • I found a solution using GzipFile writer and provide it fileObject that I use as a spy to get the data without actually writing to a file.
    Then I yield the results from it right into a queue, that I build on other thread.

    Long story short here is the code

    Usage

    sftp.putfo(ReaderMaker(unzipped_file_path), remote_path)
    

    Implementation

    COMPRESS_LEVEL = 9
    BUFF_SIZE = 1024 * 1024
    
    class ReaderMaker:
        def __init__(self, inputFile):
            self.it = self.compressor(inputFile)
            self.queue = Queue.Queue(10)
            task = Thread(target=self.zipper)
            task.start()
    
        def zipper(self):
            while True:
                try:
                    data = self.it.next()
                    while len(data) == 0:
                        data = self.it.next()
                    self.queue.put(data)
                except StopIteration:
                    break
            self.queue.put('')  # this will notify the last item read
    
        def read(self, n):
            return self.queue.get()
    
        def compressor(self, inputFile):
            class Spy:
                def __init__(self):
                    self.data = ''
    
                def write(self, d):
                    self.data = self.data + d
    
                def flush(self):
                    d = self.data
                    self.data = ''
                    return d
    
            f_in = open(inputFile, 'rb')
            spy = Spy()
    
            f_out = GzipFile('dummy.gz', 'wrb+', COMPRESS_LEVEL, spy)
    
            while True:
                data = f_in.read(BUFF_SIZE)
                if len(data) == 0:
                    break
                f_out.write(data)
                yield spy.flush()
    
            f_out.close()
            f_in.close()
            yield spy.flush()