pythoncurltemporary-filescookiejarcookielib

Writing to NamedTemporaryFile fails silently; converting Curl cookie jar to Requests cookies


I'm trying to take the Netscape HTTP Cookie File that Curl spits out and convert it to a Cookiejar that the Requests library can work with. I have netscapeCookieString in my Python script as a variable, which looks like:

# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

.miami.edu  TRUE    /   TRUE    0   PS_LASTSITE https://canelink.miami.edu/psc/PUMI2J/

Since I don't want to parse the cookie file myself, I'd like to use cookielib. Sadly, this means I have to write to disk since cookielib.MozillaCookieJar() won't take a string as input: it has to take a file.

So I'm using NamedTemporaryFile (couldn't get SpooledTemporaryFile to work; again would like to do all of this in memory if possible).

tempCookieFile = tempfile.NamedTemporaryFile()

# now take the contents of the cookie string and put it into this in memory file
# that cookielib will read from. There are a couple quirks though. 
for line in netscapeCookieString.splitlines():

    # cookielib doesn't know how to handle httpOnly cookies correctly
    # so we have to do some pre-processing to make sure they make it into
    # the cookielib. Basically just removing the httpOnly prefix which is honestly
    # an abuse of the RFC in the first place. note: httpOnly actually refers to
    # cookies that javascript can't access, as in only http protocol can
    # access them, it has nothing to do with http vs https. it's purely 
    # to protect against XSS a bit better. These cookies may actually end up
    # being the most critical of all cookies in a given set.
    # https://stackoverflow.com/a/53384267/2611730
    if line.startswith("#HttpOnly_"):
        # this is actually how the curl library removes the httpOnly, by doing length
        line = line[len("#HttpOnly_"):]

    tempCookieFile.write(line)

tempCookieFile.flush()

# another thing that cookielib doesn't handle very well is 
# session cookies, which have 0 in the expires param
# so we have to make sure they don't get expired when they're
# read in by cookielib
# https://stackoverflow.com/a/14759698/2611730
print tempCookieFile.read()
cookieJar = cookielib.MozillaCookieJar(tempCookieFile.name)
cookieJar.load(ignore_expires=True)
pprint.pprint(cookieJar)

But here's the kicker, this doesn't work!

print tempCookieFile.read() prints an empty line.

Thus, pprint.pprint(cookieJar) prints an empty cookie jar.

I was easily able to reproduce this on my Mac:

>>> import tempfile
>>> tempCookieFile = tempfile.NamedTemporaryFile()
>>> tempCookieFile.write("hey")
>>> tempCookieFile.flush()
>>> print tempCookieFile.read()

>>>

How can I actually write to a NamedTemporaryFile?


Solution

  • After you write to the file, the pointer to that file is to the location after that written data (in your case end of file) so when you read it returns an empty string (no more data after end of file) just seek to 0 before reading

    >>> import tempfile
    >>> tempCookieFile = tempfile.NamedTemporaryFile()
    >>> tempCookieFile.write("hey")
    >>> tempCookieFile.seek(0)
    >>> print(tempCookieFile.read())