pythonpyarrowapache-arrow

How to sort a Pyarrow table?


How do I sort an Arrow table in PyArrow?

There does not appear to be a single function that will do this, the closest is sort_indices.


Solution

  • PyArrow includes Table.sort_by since 7.0.0, no need to manually call the compute functions (reference)

    table = pa.table([
          pa.array(["a", "a", "b", "b", "b", "c", "d", "d", "e", "c"]),
          pa.array([15, 20, 3, 4, 5, 6, 10, 1, 14, 123]),
          ], names=["keys", "values"])
    sorted_table = table.sort_by([("values", "ascending")])