pythonurllib2cookielib

Python - urllib2 & cookielib


I am trying to open the following website and retrieve the initial cookie and use it for the second url-open BUT if you run the following code it outputs 2 different cookies. How do I use the initial cookie for the second url-open?

import cookielib, urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

home = opener.open('https://www.idcourts.us/repository/start.do')
print cj

search = opener.open('https://www.idcourts.us/repository/partySearch.do')
print cj

Output shows 2 different cookies every time as you can see:

<cookielib.CookieJar[<Cookie JSESSIONID=0DEEE8331DE7D0DFDC22E860E065085F for www.idcourts.us/repository>]>
<cookielib.CookieJar[<Cookie JSESSIONID=E01C2BE8323632A32DA467F8A9B22A51 for www.idcourts.us/repository>]>

Solution

  • This is not a problem with urllib. That site does some funky stuff. You need to request a couple of stylesheets for it to validate your session id:

    import cookielib, urllib2
    
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    # default User-Agent ('Python-urllib/2.6') will *not* work
    opener.addheaders = [
        ('User-Agent', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11'),
        ]
    
    
    stylesheets = [
        'https://www.idcourts.us/repository/css/id_style.css',
        'https://www.idcourts.us/repository/css/id_print.css',
    ]
    
    home = opener.open('https://www.idcourts.us/repository/start.do')
    print cj
    sessid = cj._cookies['www.idcourts.us']['/repository']['JSESSIONID'].value
    # Note the +=
    opener.addheaders += [
        ('Referer', 'https://www.idcourts.us/repository/start.do'),
        ]
    for st in stylesheets:
        # da trick
        opener.open(st+';jsessionid='+sessid)
    search = opener.open('https://www.idcourts.us/repository/partySearch.do')
    print cj
    # perhaps need to keep updating the referer...