I have deployed 3 spring boot apps with Hazelcast Jet embedded. The nodes recognize each other and run as a cluster. I have the following code: A simple reading from CSV and write to a file. But Jet writes duplicates to the file sink. To be precise, Jet processes total entries in the CSV multiplied by the number of nodes. So, if I have 10 entries in the source and 3 nodes, I see 3 files in the sink each having all 10 entries. I just want to process one record once and only once. Following is my code:
Pipeline p = Pipeline.create();
BatchSource<List<String>> source = Sources.filesBuilder("files")
.glob("*.csv")
.build(path -> Files.lines(path).skip(1).map(line -> split(line)));
p.readFrom(source)
.writeTo(Sinks.filesBuilder("out").build());
instance.newJob(p).join();
If it's a shared file system, then sharedFileSystem
attribute in FilesourceBuilder
must be set to true
.