I would like to implement an apache flink trigger that will fire when the state accumulates 256MB. I would like to do this because my sink is writing parquet files to hdfs and i would like to run ETL on them later, which means I don’t want too small or too large files, and my source(apache kafka topic) is changing in volume constantly.
I didn’t find a way to do it. I found some the StateObject interface that have the size() function. Didn’t find a way to use it.
I would use a Flink FileSink with the Parquet bulk format, and have a rolling policy that constrains the file size, but rolls based on your maximum allowable latency.