pythonurllib2cookielib

Storing cookielib cookies in a database


I'm using the cookielib module to handle HTTP cookies when using the urllib2 module in Python 2.6 in a way similar to this snippet:

import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
r = opener.open("http://example.com/")

I'd like to store the cookies in a database. I don't know whats better - serialize the CookieJar object and store it or extract the cookies from the CookieJar and store that. I don't know which one's better or how to implement either of them. I should be also be able to recreate the CookieJar object.

Could someone help me out with the above?

Thanks in advance.


Solution

  • cookielib.Cookie, to quote its docstring (in its sources),

    is deliberately a very simple class. It just holds attributes.

    so pickle (or other serialization approaches) are just fine for saving and restoring each Cookie instance.

    As for CookieJar, set_cookie sets/adds one cookie instance, __iter__ (to use the latter, just do a for loop on the jar instance) returns all cookie instances it holds, one after the other.

    A subclass that you can use to see how to make a "cookie jar on a database" is BSDDBCookieJar (part of mechanize, but I just pointed specifically to the jar source code file) -- it doesn't load all cookies in memory, but rather keeps them in a self._db which is a bsddb instance (mostly-on-disk, dict-lookalike hash table constrained to having only strings as keys and values) and uses pickle for serialization.

    If you are OK with keeping every cookie in memory during operations, simply pickleing the jar is simplest (and, of course, put the blob in the DB and get it back from there when you're restarting) -- s = cPickle.dumps(myJar, -1) gives you a big byte string for the whole jar (and policy thereof, of course, not just the cookies), and theJar = cPickle.loads(s) rebuilds it once you've reloaded s as a blob from the DB.