serializationapache-arrowapache-arrow-flight

What is the best way to send Arrow data to the browser?


I have Apache Arrow data on the server (Python) and need to use it in the browser. It appears that Arrow Flight isn't implemented in JS. What are the best options for sending the data to the browser and using it there?

I don't even need it necessarily in Arrow format in the browser. This question hasn't received any responses, so I'm adding some additional criteria for what I'm looking for:

Surely this is a solved problem? If it is I've been unable to find a solution. Please help!


Solution

  • Building off of the comments on your original post by David Li, you can implement a non-streaming version what you want without too much code using PyArrow on the server side and the Apache Arrow JS bindings on the client. The Arrow IPC format satisfies your requirements because it ships the schema with the data, is space-efficient and zero-copy, and is cross-platform.

    Here's a toy example showing generating a record batch on server and receiving it on the client:

    Server:

    from io import BytesIO
    
    from flask import Flask, send_file
    from flask_cors import CORS
    import pyarrow as pa
    
    app = Flask(__name__)
    CORS(app)
    
    @app.get("/data")
    def data():
        data = [
            pa.array([1, 2, 3, 4]),
            pa.array(['foo', 'bar', 'baz', None]),
            pa.array([True, None, False, True])
        ]
        batch = pa.record_batch(data, names=['f0', 'f1', 'f2'])
    
        sink = pa.BufferOutputStream()
    
        with pa.ipc.new_stream(sink, batch.schema) as writer:
            writer.write_batch(batch)
    
        return send_file(BytesIO(sink.getvalue().to_pybytes()), "data.arrow")
    

    Client

    const table = await tableFromIPC(fetch(URL));
    // Do what you like with your data
    

    Edit: I added a runnable example at https://github.com/amoeba/arrow-python-js-ipc-example.