apache-sparkapache-flinkstreamingflink-batch

what is best way to Order a Batch in Flink or Spark?


I'm building a process in batch mode in flink . where I have a table with 100 millions row and I need to order by one field all the table so I wondering what is the better technology to order huge table .

In my case I prefer to use Flink If Flink is a good way to order distributed Can you mention a way to ordering rows with good practices in flink

Know the best form to order a table in flink.


Solution

  • In Flink, I would use Flink SQL or the Table API for this:

    https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/sql/queries/orderby/

    https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/table/tableapi/#orderby-offset--fetch