pythonxmlparsingbit.ly

Parsing XML response of bit.ly


I was trying out the bit.ly api for shorterning and got it to work. It returns to my script an xml document. I wanted to extract out the tag but cant seem to parse it properly.

askfor = urllib2.Request(full_url)
response = urllib2.urlopen(askfor)
the_page = response.read()

So the_page contains the xml document. I tried:

from xml.dom.minidom import parse
doc = parse(the_page)

this causes an error. what am I doing wrong?


Solution

  • You don't provide an error message so I can't be sure this is the only error. But, xml.minidom.parse does not take a string. From the docstring for parse:

    Parse a file into a DOM by filename or file object.

    You should try:

    response = urllib2.urlopen(askfor)
    doc = parse(response)
    

    since response will behave like a file object. Or you could use the parseString method in minidom instead (and then pass the_page as the argument).

    EDIT: to extract the URL, you'll need to do:

    url_nodes = doc.getElementsByTagName('url')
    url = url_nodes[0]
    print url.childNodes[0].data
    

    The result of getElementsByTagName is a list of all nodes matching (just one in this case). url is an Element as you noticed, which contains a child Text node, which contains the data you need.