pythoncstringio

using generators and cStringIO in python to stream strings


I'm trying to read a very large string stream using cStringIO in a python dictionary:

def stream_read(self, path):
    try:
        # create a string stream from the contents at 'path'
        # note: the string at self._my_dict[path] is 7MB in size
        stream = StringIO.StringIO(self._my_dict[path])
        while True:
            # buffer size is 128kB, or 128 * 1024
            buf = stream.read(self.buffer_size)
            if buf != '':
                yield buf
            else:
                raise StopIteration
    except KeyError:
        raise IOError("Could not get content")

And in my test suite, I'm testing this function by first testing stream_write, asserting that the data exists at that path, and then calling stream_read:

def test_stream(self):
    filename = self.gen_random_string()
    # test 7MB
    content = self.gen_random_string(7 * 1024 * 1024)
    # test stream write
    io = StringIO.StringIO(content)
    self._storage.stream_write(filename, io)
    io.close()
    self.assertTrue(self._storage.exists(filename))
    # test read / write
    data = ''
    for buf in self._storage.stream_read(filename):
        data += buf
    self.assertEqual(content, data)

Yet in my test suite, I'm catching an AssertionError:

======================================================================
FAIL: test_stream (test_swift_storage.TestSwiftStorage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/bacongobbler/.../test/test_local_storage.py", line 44, in test_stream
    self.assertEqual(content, data)
AssertionError: '[squelched]' != '<cStringIO.StringI object at 0x3148e70>'
----------------------------------------------------------------------
Ran 28 tests in 20.495s

FAILED (failures=1)

It looks related to an issue I posted last week, but I'm still not quite sure I understand why stream is getting set to the Generator as a string in this case.

If anyone wants to take a closer look at the source code, it's all up at https://github.com/bacongobbler/docker-registry/blob/106-swift-storage/test/utils/mock_swift_storage.py


Solution

  • You store just the StringIO object when calling self._storage.stream_write(filename, io):

    def put_content(self, path, content, chunk=None):
        path = self._init_path(path)
        try:
            self._swift_container[path] = content
        except Exception:
            raise IOError("Could not put content")
    

    where content is the io object you passed in.

    Later on, you pass that file object to StringIO again:

    stream = StringIO.StringIO(self.get_content(path))
    

    This calls str() on self.get_content(path), storing the string representation of a cStringIO.StringI() instance:

    >>> from cStringIO import StringIO
    >>> str(StringIO('test data'))
    '<cStringIO.StringI object at 0x1074ea470>'
    

    Your reading code works fine, it is your writing mock that needs to actually take the data out of the StringIO object.

    A .read() call will do here:

    def put_content(self, path, content, chunk=None):
        path = self._init_path(path)
        try:
            self._swift_container[path] = content.read()
        except Exception:
            raise IOError("Could not put content")