apache-sparkapache-flinkstreamingflink-batch

Flink FileSink output in multi files


When an execute job in batch mode in flink the fileSink generates multiple files by the parallel number but I want only the output in one file without changing the parallel number How can I do that ?

put parallel in 5 and get only one file in the output of FileSink.

   OutputFileConfig config = OutputFileConfig
            .builder()
            .withPartPrefix("prefix")
            .withPartSuffix(".txt")
            .build();


    final FileSink<String> sinkfile = FileSink
            .forRowFormat(new Path("src/main/resources/output"), new SimpleStringEncoder<String>("UTF-8"))
            .withBucketAssigner(new BasePathBucketAssigner<>())
            .withOutputFileConfig(config)

            .build();

Solution

  • You can reduce the parallelism of the sink to 1, while leaving the rest of the pipeline at 5. That way you'll have just one output stream. If you really need to scale the sink up to 5 instances, then they will operate independently, and create 5 separate output streams.