I'm trying to use Angular.js client-side with webapp2 on Google Appengine.
In order to solve the SEO issues the idea was to use a headless browser to run the javascript server-side and serve the resulting html to the crawlers.
Is there any headless browser for python that runs on google app engine?
This can now be done on App Engine Flex with a custom runtime, so I'm adding this answer since this question is the first thing to popup in google.
I based this custom runtime off of my other GAE flex microservice which uses the pre-built python runtime
Project Structure:
webdrivers/
- geckodriver
app.yaml
Dockerfile
main.py
requirements.txt
app.yaml:
service: my-app-engine-service-name
runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT main:app --timeout 180
Dockerfile:
FROM gcr.io/google-appengine/python
RUN apt-get update
RUN apt-get install -y xvfb
RUN apt-get install -y firefox
LABEL python_version=python
RUN virtualenv --no-download /env -p python
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
ADD requirements.txt /app/
RUN pip install -r requirements.txt
ADD . /app/
CMD exec gunicorn -b :$PORT main:app --timeout 180
requirements.txt:
Flask==0.12.2
gunicorn==19.7.1
selenium==3.13.0
pyvirtualdisplay==0.2.1
main.py
import os
import traceback
from flask import Flask, jsonify, Response
from selenium import webdriver
from pyvirtualdisplay import Display
app = Flask(__name__)
# Add the webdrivers to the path
os.environ['PATH'] += ':'+os.path.dirname(os.path.realpath(__file__))+"/webdrivers"
@app.route('/')
def hello():
return 'Hello!!'
@app.route('/test/', methods=['GET'])
def go_headless():
try:
display = Display(visible=0, size=(1024, 768))
display.start()
d = webdriver.Firefox()
d.get("http://www.python.org")
page_source = d.page_source.encode("utf-8")
d.close()
display.stop()
return jsonify({'success': True, "result": page_source[:500]})
except Exception as e:
print traceback.format_exc()
return jsonify({'success': False, 'msg': str(e)})
if __name__ == '__main__':
app.run(host='127.0.0.1', port=8080, debug=True)
Download geckodriver from here (linux 64):
https://github.com/mozilla/geckodriver/releases
Other notes:
WebDriverException: Message: Can't load the profile. Possible firefox version mismatch. You must use GeckoDriver instead for Firefox 48+. Profile Dir: /tmp/tmp 48P If you specified a log_file in the FirefoxBinary constructor, check it for details.
DesiredCapabilities().FIREFOX["marionette"] = False
https://github.com/SeleniumHQ/selenium/issues/5106display = Display(visible=0, size=(1024, 768))
is needed to fix this error: How to fix Selenium WebDriverException: The browser appears to have exited before we could connect?To test locally:
docker build . -t my-docker-image-tag
docker run -p 8080:8080 --name=my-docker-container-name my-docker-image-tag
To deploy to app engine:
gcloud app deploy app.yaml --version dev --project my-app-engine-project-id