pythonpython-requestschunked-encodinghttp-chunked

Python Requests - ChunkedEncodingError(e) - requests.iter_lines


I'm getting a ChunkedEncodingError(e) using Python requests. I'm using the following to rip down JSON:

r = requests.get(url, headers=auth, stream=True)

And the iterating over each line, using the carriage return as a delimiter, which is how this API distinguishes between distinct JSON events.

for d in r.iter_lines(delimiter="\n"):
    d += "\n"
    sock.send(d)

I'm delimiting on the carriage return and then adding it back in as the endpoint I'm pushing the logs to actually expects a carriage return at the end of each event also. This seems to work for roughly 100k log files. When I try to make a larger call I'll get this following thrown:

for d in r.iter_lines(delimiter="\n"):
logs_1           |   File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 783, in iter_lines
logs_1           |     for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
logs_1           |   File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 742, in generate
logs_1           |     raise ChunkedEncodingError(e)
logs_1           | requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))

UPDATE: I've discovered the API is sending back a NoneType at some point as well. So how can I account for this null byte somewhere in the response without blowing everything up? Each individual event is ended with a \n, and I need to be able to inspect each even individually. Should I chunk the content instead of iter_lines? Then ensure there is no NoneType in the chunk? That way I don't try to iter_lines over a NoneType and it blows up?


Solution

  • ChunkedEncodingError is caused by: httplib.IncompletedRead

    enter image description here

    import httplib
    
    def patch_http_response_read(func):
        def inner(*args):
            try:
                return func(*args)
            except httplib.IncompleteRead, e:
                return e.partial
        return inner
    
    httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read)
    

    I think this could be a patch. It allows you to deal with defective http servers.

    Most servers transmit all data, but due implementation errors they wrongly close session and httplib raise error and bury your precious bytes.