javascriptpythonscreen-scraping

Executing Javascript from Python


I have HTML webpages that I am crawling using xpath. The etree.tostring of a certain node gives me this string:

<script>
<!--
function escramble_758(){
  var a,b,c
  a='+1 '
  b='84-'
  a+='425-'
  b+='7450'
  c='9'
  document.write(a+c+b)
}
escramble_758()
//-->
</script>

I just need the output of escramble_758(). I can write a regex to figure out the whole thing, but I want my code to remain tidy. What is the best alternative?

I am zipping through the following libraries, but I didnt see an exact solution. Most of them are trying to emulate browser, making things snail slow.

Edit: An example will be great.. (barebones will do)


Solution

  • Using PyV8, I can do this. However, I have to replace document.write with return because there's no DOM and therefore no document.

    import PyV8
    ctx = PyV8.JSContext()
    ctx.enter()
    
    js = """
    function escramble_758(){
    var a,b,c
    a='+1 '
    b='84-'
    a+='425-'
    b+='7450'
    c='9'
    document.write(a+c+b)
    }
    escramble_758()
    """
    
    print ctx.eval(js.replace("document.write", "return "))
    

    Or you could create a mock document object

    class MockDocument(object):
    
        def __init__(self):
            self.value = ''
    
        def write(self, *args):
            self.value += ''.join(str(i) for i in args)
    
    
    class Global(PyV8.JSClass):
        def __init__(self):
            self.document = MockDocument()
    
    scope = Global()
    ctx = PyV8.JSContext(scope)
    ctx.enter()
    ctx.eval(js)
    print scope.document.value