I am trying to dive into the new Stateful Functions approach and I already tried to create a savepoint manually (https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/deployment-and-operations/state-bootstrap.html#creating-a-savepoint).
It works like a charm but I can't find a way how to do it automatically. For example, I have a couple millions of keys and I need to write them all to savepoint.
Is your question about how to replace the env.fromElements
in the example with something that reads from a file, or other data source? Flink's DataSet API, which is what's used here, can read from any HadoopInputFormat
. See DataSet Connectors for details.
There are easy-to-use shortcuts for common cases. If you just want to read data from a file using a TextInputFormat
, that would look like this:
env.readTextFile(path)
and to read from a CSV file using a CsvInputFormat
:
env.readCsvFile(path)
See Data Sources for more about working with these shortcuts.
If I've misinterpreted the question, please clarify your concerns.