apache-flinkapache-beam

Apache Beam DirectRunner vs FlinkRunner example


i've built the simplest pipeline using beam yaml (python sdk) where a csv file is read and should be printed to log. while running with default DirectRunner:

python -m apache_beam.yaml.main --pipeline_spec_file=pipeline-01.yaml 

everything works fine and i do see the output, however when using FlinkRunner:

python -m apache_beam.yaml.main --pipeline_spec_file=pipeline-01.yaml --runner=FlinkRunner --flink_version=1.16 --flink_master=localhost:8081 --environment_type=EXTERNAL --environment_config=localhost:50000

no logs are printed, even though i can see through the Flink Dashboard that the run succeeded.

my pipeline:

pipeline:
  type: chain
  transforms:
    - type: ReadFromCsv
      config:
        path: data/input2.csv
    - type: LogForTesting

the path is to a file stored locally on my computer.

can anyone clarify? Thanks


Solution

  • The answer to this was quite embarrassing..

    my pipeline was looking at a file saved locally, but i forgot to copy it into flink cluster (so basically, there were no logs because the file was "empty" when running with FlinkRunner. Once file was copied it worked great :)