apache-flink

When to use transient, when not to in flink?


in this code, should i use transient?

when can i use transient?

what is the difference ?

need your help

private              Map<String, HermesCustomConsumer> topicSourceMap                 = new ConcurrentHashMap();
private              Map<TopicAndPartition, Long>      currentOffsets                 = new HashMap<>();
private transient Map<TopicAndPartition, Long>         restoredState;

Solution

  • TL;DR
    If you use transient variable, you'd better instantiate it in open() method of operators which implemented Rich interface. Otherwise, declare the variable with an initial value at the same time.

    The states you use here are called raw states managed by the user itself. Whether you should use transient modifier depending on serialization purpose. Before you submit the Flink job. The computation topology will be generated and distributed into Flink cluster. And operators including source and sink will instantiate with fields e.g, topicSourceMap in your code. Variables topicSourceMap and currentOffsets have been instantiated with constructor. While restoredState is a transient variable, thus no matter what initial value you assigned with, it will not be serialized and distributed into some task to execute. So you usually need to instanciate it in open() method of operator which implemented Rich interface. After this operator is deserialized in some task, open() method would be invoked into instantiate your own states.