pythonpython-requestspypdfbytestream

Download a pdf direct to memory to use it with python


The goal is to download a pdf file via requests (Python) without saving it on the hard disk. The i'd like to access it with PdfReader from PyPDF2, again without saving it.

def readFile(self, id):
        req = get(f'{self.workUrl}{id}/content', headers={'OTCSTicket': self.ticket})
        if req.status_code == 200: return req.raw
        else: raise Exception(f'Error Code {req.status_code}')

obj = server.readFile(id)
reader = PdfReader(obj)

Solution

  • Instead of simply returning the raw object, you can wrap it or the req.content variable in io.BytesIO, which creates a file-like object you can open with PdfReader.

    Like this:

    def readFile(self, id):
        req = requests.get(
            url=f'{self.workUrl}{id}content/',
            headers={'OTCSTicket': self.ticket}
    )
        if req.ok:
            return io.BytesIO(req.content)
        raise Exception(f'Error Code:  {req.status_code}')
    
    reader = PdfReader(readFile(id))