pythongoogle-app-engineheadless-browser

Python Headless Browser for GAE


I'm trying to use Angular.js client-side with webapp2 on Google Appengine.

In order to solve the SEO issues the idea was to use a headless browser to run the javascript server-side and serve the resulting html to the crawlers.

Is there any headless browser for python that runs on google app engine?


Solution

  • This can now be done on App Engine Flex with a custom runtime, so I'm adding this answer since this question is the first thing to popup in google.

    I based this custom runtime off of my other GAE flex microservice which uses the pre-built python runtime

    Project Structure:

    webdrivers/
    - geckodriver
    app.yaml
    Dockerfile
    main.py
    requirements.txt
    

    app.yaml:

    service: my-app-engine-service-name
    runtime: custom
    env: flex
    entrypoint: gunicorn -b :$PORT main:app --timeout 180
    

    Dockerfile:

    FROM gcr.io/google-appengine/python
    RUN apt-get update
    RUN apt-get install -y xvfb
    RUN apt-get install -y firefox
    LABEL python_version=python
    RUN virtualenv --no-download /env -p python
    ENV VIRTUAL_ENV /env
    ENV PATH /env/bin:$PATH
    ADD requirements.txt /app/
    RUN pip install -r requirements.txt
    ADD . /app/
    CMD exec gunicorn -b :$PORT main:app --timeout 180
    

    requirements.txt:

    Flask==0.12.2
    gunicorn==19.7.1
    selenium==3.13.0
    pyvirtualdisplay==0.2.1
    

    main.py

    import os
    import traceback
    
    from flask import Flask, jsonify, Response
    from selenium import webdriver
    from pyvirtualdisplay import Display
    
    app = Flask(__name__)
    
    # Add the webdrivers to the path
    os.environ['PATH'] += ':'+os.path.dirname(os.path.realpath(__file__))+"/webdrivers"
    
    @app.route('/')
    def hello():
        return 'Hello!!'
    
    @app.route('/test/', methods=['GET'])
    def go_headless():
        try:
            display = Display(visible=0, size=(1024, 768))
            display.start()
            d = webdriver.Firefox()
            d.get("http://www.python.org")    
            page_source = d.page_source.encode("utf-8")
            d.close()
            display.stop()
            return jsonify({'success': True, "result": page_source[:500]})
        except Exception as e:
            print traceback.format_exc()
            return jsonify({'success': False, 'msg': str(e)})
    
    if __name__ == '__main__':
        app.run(host='127.0.0.1', port=8080, debug=True)
    

    Download geckodriver from here (linux 64):

    https://github.com/mozilla/geckodriver/releases

    Other notes:

    To test locally:

    docker build . -t my-docker-image-tag
    docker run -p 8080:8080 --name=my-docker-container-name my-docker-image-tag
    

    To deploy to app engine:

    gcloud app deploy app.yaml --version dev --project my-app-engine-project-id