pythongoogle-app-engineseleniumscrapywindmill

Simulating a browser on Google App Engine


I want to use selenium or windmill inside google app engine in order to scrape a JS filled website. I know that windmill is written in python and javascript.

Is this possible? If it is, how do insert the library?
If not, could you explain why and provide alternatives?

Thanks.

Update

I searched a little more and saw that scrapy is pure python.
Will that work? Does it handle javascript?


Solution

  • Both Selenium and windmill (which is think is now unmaintaned) are controllers for a real browser. Usually they spawn a real browser (e.g. Firefox) as a subprocess and control it. I don't think you can do that in AppEngine. The closest thing to a pure-code browser that I know of is htmlunit, put that's Java. As far as I know there is no equivalent for Python.