pythondockerflaskbokehpyviz

Flask + Bokeh in Docker authentication


We're using Flask to route users to Bokeh servers. The system is running inside a Docker image. Everything works well. But now we want to add authentication, which is proving difficult because we don't want to map the bokeh server ports to the client.

Let me show you how it's currently working (without authentication):

Flask app.py (routing):

...
@app.route('/folder/report_x')
def page_folder_report_x():
    ''' embedded bokeh server for report_x '''
    script = server_document('http://localhost:5001/report_x')
    resp = {
        'title': 'Report X',
        'script': script,
        'template': 'Flask', }
    return render_template('embed.html', **resp)
...
app.run(host='0.0.0.0', port=5000, use_reloader=False)

Flask embed.py (template):

...
{% extends "base.html" %}
{% block content %}
  {{ script|safe }}
{% endblock %}
...

Bokeh server is started using python's Panel from the commandline (localhost:5000 represents the Flask server):

panel serve report_x --port 5001 --allow-websocket-origin localhost:5000

The Bokeh server is served up using a main.ipynb file:

import panel as pn
from bokeh.models import ColumnDataSource, CustomJS
from bokeh.models.widgets import Button, DataTable, PreText
from bokeh.models.widgets import TableColumn, NumberFormatter, DateFormatter
...
gspec = pn.GridSpec(sizing_mode='stretch_both')
gspec[0:12, 0:12] = pn.WidgetBox(widgets)
...
gspec.servable()

Our Docker image exposes the ports of the flask server, and the bokeh server(s):

...
RUN pip install -r /app/requirements.txt
EXPOSE 5000:5000
EXPOSE 5001:5001
...

Lastly, when we run the docker container we map the ports:

# success!
docker run -p 5000:5000 -p 5001:5001 report_server:0.1

If we run the docker image this way everything works perfectly.

But if we run it without mapping the bokeh server, we can't reach the bokeh server (even though it's internally exposed as you can see in the DockerFile):

# fail
docker run -p 5000:5000 report_server:0.1

For security purposes, we only want to map one port to the outside world. Is there something we're missing about how to embed Bokeh servers in Flask that would allow only Flask to talk to the Bokeh server?


Solution

  • Is there something we're missing about how to embed Bokeh servers in Flask that would allow only Flask to talk to the Bokeh server?

    The client (browser) has to be able to talk to the Bokeh server, full stop. All functions of the Bokeh server take place over a direct websocket connection between a Bokeh server and a browser. So the short answer to your question is "you can't".

    However, what you can do is configure the Bokeh server to:

    To do this, you need to first create a secret for signing session ids with, using the bokeh secret command, e.g.

    export BOKEH_SECRET_KEY=`bokeh secret` 
    

    Then also set BOKEH_SIGN_SESSIONS and set the allowed websocket origin:

    BOKEH_SIGN_SESSIONS=yes bokeh serve --allow-websocket-origin=<app origin> app.py
    

    Then in your flask app, you explicitly provide (signed) session ids:

    from bokeh.util.session_id import generate_session_id
    
    script = server_session(url='http://localhost:5006/bkapp', 
                            session_id=generate_session_id())
    return render_template("embed.html", script=script, template="Flask")
    

    Note that the BOKEH_SECRET_KEY environment variable needs to be set (and identical) for both the Bokeh server and Flask processes.

    Now if anyone connects to the Bokeh server directly, they will get back a 403 error, unless the connection URL contains a signed session id, signed with the same secret that the Bokeh server was started with. Presumably only your Flask app knows this secret, so only it can successfully initiate new sessions.

    Is this enough to completely secure things? Technically anyone who can access the connection string sent to the browser (e.g. the user viewing the app, or a sophisticated MitM attacker, especially if you don't terminate HTTPS in front of the app) could extract the signed session id. But as long as you set the allowed websocket origin, then this information can't be used to initiate a new connection from anywhere outside your app. If someone were to try, the server would return a 403:

    ERROR:bokeh.server.views.ws:Refusing websocket connection from Origin 'http://localhost:5006'; use --allow-websocket-origin=localhost:5006 or set BOKEH_ALLOW_WS_ORIGIN=localhost:5006 to permit this; currently we allow origins {'localhost:8000'}

    I don't think you can fake an Origin header from a real browser, though maybe it's possible someone could build a modified Chrome from source code (it's not easy, but not impossible) to spoof one. If you need to guard against that, the Bokeh Project Discourse is probably a better place to continue the discussion, as it is somewhat open-ended, and may point to new feature development (e.g. ability to specify a connection limit for sessions, or that session ids not be ever re-usable).

    For reference, there is a complete example here that also embeds a Bokeh server directly inside a Flask process (if you need scale out or expect several simultaneous users this would be too naive a deployment):

    https://gist.github.com/bryevdv/481fc64c59620acb74c64bff0f4d47d0

    As a last comment, you could probably also (additionally) put the bokeh server URL behind an authenticating proxy of some sort, to prevent the WS upgrade from happening in the first place, without authentication. Tho I am not sure exactly what that would look like offhand. That would also be better-discussed on the Discourse