I have Apache Arrow data on the server (Python) and need to use it in the browser. It appears that Arrow Flight isn't implemented in JS. What are the best options for sending the data to the browser and using it there?
I don't even need it necessarily in Arrow format in the browser. This question hasn't received any responses, so I'm adding some additional criteria for what I'm looking for:
Surely this is a solved problem? If it is I've been unable to find a solution. Please help!
Building off of the comments on your original post by David Li, you can implement a non-streaming version what you want without too much code using PyArrow on the server side and the Apache Arrow JS bindings on the client. The Arrow IPC format satisfies your requirements because it ships the schema with the data, is space-efficient and zero-copy, and is cross-platform.
Here's a toy example showing generating a record batch on server and receiving it on the client:
Server:
from io import BytesIO
from flask import Flask, send_file
from flask_cors import CORS
import pyarrow as pa
app = Flask(__name__)
CORS(app)
@app.get("/data")
def data():
data = [
pa.array([1, 2, 3, 4]),
pa.array(['foo', 'bar', 'baz', None]),
pa.array([True, None, False, True])
]
batch = pa.record_batch(data, names=['f0', 'f1', 'f2'])
sink = pa.BufferOutputStream()
with pa.ipc.new_stream(sink, batch.schema) as writer:
writer.write_batch(batch)
return send_file(BytesIO(sink.getvalue().to_pybytes()), "data.arrow")
Client
const table = await tableFromIPC(fetch(URL));
// Do what you like with your data
Edit: I added a runnable example at https://github.com/amoeba/arrow-python-js-ipc-example.