I would like to know if anyone implemented Camunda as scheduler and orchestrator of data pipelines/ETL and can share his experience.
What are the pros and cons of using it instead of Airflow for example?
Thanks!
Camunda
Camunda does not offer connectors (like S3, database, mongo, rabbitmq, kafka, powerBi) which only makes it a weak candidate for ETL. One may say that you have custom processors - then yes - you need to write Java for those and achieve ETL. I found it suitable for human in the loop decision process modeling.
Apache Airflow
I have tried numerous experiments in Apache Airflow https://github.com/kurtzace/airflow-experiments - this one can make DAGs well. Has numerous connectors ready to be used . Of course with a little bit of python .Using Spiff - we can achieve BPMN type experiments. Needs lesser code when compared to Camunda and Apache airflow.
cons: high learning curve - mostly used for datascience pipelines
Apache Nifi
But on the other extremity - I found Apache Nifi to be better suited for it. Needs lesser code as compared. Possesses Many prebuilt processors like - Batch/file, http/https/rest, S3, json transformers, csv transformers, db connectivity, concat, merge, filter.
Cons: Nifi is not good for a. more than 15 min processing b. behave like spark distributed computer c. Data volumes becomes more than a gb per connection d. complex joins, rolling window, e. rabbitmq type eventing