persistenceapache-flinkflink-streamingcheckpointing

How to persist a Queryable State in Flink?


I am using FLink v.1.4.0. I am using a QueryableStateStream which I key in some way and then sink it to create a Queryable State, e.g:

stream.keyBy(0).asQueryableState("query-name");

That's all good as long as my Flink job is running. As soon as the job is killed the state is no longer accessible.

I have two questions:

  1. How do I persist the queryable state? Can this be done at regular intervals like checkpointing? Should I use the Managed State solution instead?
  2. How can I initialise a QueryableState with data persisted from a previous execution?

I would appreciate practical examples for both questions. Thanks.


Solution

  • Queryable state is managed state, and it will be checkpointed and restored. Of course, it is true that Flink state isn't accessible while your application isn't running.

    You could attach something like redis or cassandra or whatever database you prefer as a sink to your job (or a compacted Kafka topic). This will make the data available while your Flink job isn't running. But it's worth considering whether keeping a database (or Kafka) running is easier than keeping your flink job up.

    There's no need to re-initialize the state from the external database, since Flink will restore its state from a checkpoint or savepoint. But you could do that in the open() method of a RichFunction, if need be.