javaapache-flinkflink-streamingflink-cepflink-batch

Apache Flink trigger that fires when state size threshold is reached


I would like to implement an apache flink trigger that will fire when the state accumulates 256MB. I would like to do this because my sink is writing parquet files to hdfs and i would like to run ETL on them later, which means I don’t want too small or too large files, and my source(apache kafka topic) is changing in volume constantly.

I didn’t find a way to do it. I found some the StateObject interface that have the size() function. Didn’t find a way to use it.


Solution

  • I would use a Flink FileSink with the Parquet bulk format, and have a rolling policy that constrains the file size, but rolls based on your maximum allowable latency.