apache-flinkflink-streamingflink-statefun

How to make an automatic savepoint in Flink Stateful Functions application?


I am trying to dive into the new Stateful Functions approach and I already tried to create a savepoint manually (https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/deployment-and-operations/state-bootstrap.html#creating-a-savepoint).

It works like a charm but I can't find a way how to do it automatically. For example, I have a couple millions of keys and I need to write them all to savepoint.


Solution

  • Is your question about how to replace the env.fromElements in the example with something that reads from a file, or other data source? Flink's DataSet API, which is what's used here, can read from any HadoopInputFormat. See DataSet Connectors for details.

    There are easy-to-use shortcuts for common cases. If you just want to read data from a file using a TextInputFormat, that would look like this:

    env.readTextFile(path)
    

    and to read from a CSV file using a CsvInputFormat:

    env.readCsvFile(path)
    

    See Data Sources for more about working with these shortcuts.

    If I've misinterpreted the question, please clarify your concerns.