pythonurllib2lxmlunit-testingmox

Mocking urllib2.urlopen and lxml.etree.parse using pymox


I'm trying to test some python code that uses urllib2 and lxml.

I've seen several blog posts and stack overflow posts where people want to test exceptions being thrown, with urllib2. I haven't seen examples testing successful calls.

Am I going down the correct path?

Does anyone have a suggestion for getting this to work?

Here is what I have so far:

import mox
import urllib
import urllib2
import socket
from lxml import etree

# set up the test
m = mox.Mox()
response = m.CreateMock(urllib.addinfourl)
response.fp = m.CreateMock(socket._fileobject)
response.name = None # Needed because the file name is checked.
response.fp.read().AndReturn("""<?xml version="1.0" encoding="utf-8"?>
<foo>bar</foo>""")
response.geturl().AndReturn("http://rss.slashdot.org/Slashdot/slashdot")
response.read = response.fp.read # Needed since __init__ is not called on addinfourl.
m.StubOutWithMock(urllib2, 'urlopen')
urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
m.ReplayAll()

# code under test
response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
# Note: response2.fp.read() and response2.read() do not behave the same, as defined above.
# In [21]: response2.fp.read()
# Out[21]: '<?xml version="1.0" encoding="utf-8"?>\n<foo>bar</foo>'
# In [22]: response2.read()
# Out[22]: <mox.MockMethod object at 0x97f326c>
xcontent = etree.parse(response2)

# verify test
m.VerifyAll()

It fails with:

Traceback (most recent call last):
  File "/home/jon/mox_question.py", line 22, in <module>
    xcontent = etree.parse(response2)
  File "lxml.etree.pyx", line 2583, in lxml.etree.parse (src/lxml/lxml.etree.c:25057)
  File "parser.pxi", line 1487, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:63708)
  File "parser.pxi", line 1517, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:63999)
  File "parser.pxi", line 1400, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:62985)
  File "parser.pxi", line 990, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:60508)
  File "parser.pxi", line 542, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:56659)
  File "parser.pxi", line 624, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:57472)
  File "lxml.etree.pyx", line 235, in lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:6222)
  File "parser.pxi", line 371, in lxml.etree.copyToBuffer (src/lxml/lxml.etree.c:55252)
TypeError: reading from file-like objects must return byte strings or unicode strings

This is because response.read() does not return what I expected it to return.


Solution

  • I wouldn't delve into urllib2 internals at all. It's beyond the scope of what you care about I think. Here's a simple way to do it with StringIO. The key thing here is that what you intent to parse as XML just needs to be file-like in terms of duck typing, it doesn't need to be an actual addinfourl instance.

    import StringIO
    import mox
    import urllib2
    from lxml import etree
    
    # set up the test
    m = mox.Mox()
    response = StringIO.StringIO("""<?xml version="1.0" encoding="utf-8"?>
    <foo>bar</foo>""")
    m.StubOutWithMock(urllib2, 'urlopen')
    urllib2.urlopen(mox.IgnoreArg(), timeout=10).AndReturn(response)
    m.ReplayAll()
    
    # code under test
    response2 = urllib2.urlopen("http://rss.slashdot.org/Slashdot/slashdot", timeout=10)
    xcontent = etree.parse(response2)
    
    # verify test
    m.VerifyAll()