I'm trying to test some python code that uses urllib2 and lxml.
I've seen several blog posts and stack overflow posts where people want to test exceptions being thrown, with urllib2. I haven't seen examples testing successful calls.
Am I going down the correct path?
Does anyone have a suggestion for getting this to work?
Here is what I have so far:
import mox
import urllib
import urllib2
import socket
from lxml import etree
# set up the test
m = mox.Mox()
response = m.CreateMock(urllib.addinfourl)
response.fp = m.CreateMock(socket._fileobject)
response.name = None # Needed because the file name is checked.
response.fp.read().AndReturn("""<?xml version="1.0" encoding="utf-8"?>
<foo>bar</foo>""")
response.geturl().AndReturn("http://rss.slashdot.org/Slashdot/slashdot")
response.read = response.fp.read # Needed since __init__ is not called on addinfourl.
m.StubOutWithMock(urllib2, 'urlopen')
urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
m.ReplayAll()
# code under test
response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
# Note: response2.fp.read() and response2.read() do not behave the same, as defined above.
# In [21]: response2.fp.read()
# Out[21]: '<?xml version="1.0" encoding="utf-8"?>\n<foo>bar</foo>'
# In [22]: response2.read()
# Out[22]: <mox.MockMethod object at 0x97f326c>
xcontent = etree.parse(response2)
# verify test
m.VerifyAll()
It fails with:
Traceback (most recent call last):
File "/home/jon/mox_question.py", line 22, in <module>
xcontent = etree.parse(response2)
File "lxml.etree.pyx", line 2583, in lxml.etree.parse (src/lxml/lxml.etree.c:25057)
File "parser.pxi", line 1487, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:63708)
File "parser.pxi", line 1517, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:63999)
File "parser.pxi", line 1400, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:62985)
File "parser.pxi", line 990, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:60508)
File "parser.pxi", line 542, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:56659)
File "parser.pxi", line 624, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:57472)
File "lxml.etree.pyx", line 235, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:6222)
File "parser.pxi", line 371, in lxml.etree.copyToBuffer (src/lxml/lxml.etree.c:55252)
TypeError: reading from file-like objects must return byte strings or unicode strings
This is because response.read() does not return what I expected it to return.
I wouldn't delve into urllib2 internals at all. It's beyond the scope of what you care about I think. Here's a simple way to do it with StringIO. The key thing here is that what you intent to parse as XML just needs to be file-like in terms of duck typing, it doesn't need to be an actual addinfourl instance.
import StringIO
import mox
import urllib2
from lxml import etree
# set up the test
m = mox.Mox()
response = StringIO.StringIO("""<?xml version="1.0" encoding="utf-8"?>
<foo>bar</foo>""")
m.StubOutWithMock(urllib2, 'urlopen')
urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
m.ReplayAll()
# code under test
response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
xcontent = etree.parse(response2)
# verify test
m.VerifyAll()