amazon-web-servicesamazon-s3debeziumdata-lakedata-pipeline

Setup Datapipeline Flow in AWS


Problem Statement: We have a Postgres RDS (Managed by AWS), and there is a requirement to set up a data lake (In S3) for all the data that are there in RDS. The data should be pushed to s3 on a near real-time basis, the solution should also take care of (Update, Insert, Delete Operations). There is a limitation that, we can't use the AWS Data Pipeline service because of its non-availability in the desired region.


Solution

  • This link is a great help, slight modifications here and there and it helped me setting up the pipeline. https://aws.amazon.com/blogs/big-data/creating-a-source-to-lakehouse-data-replication-pipe-using-apache-hudi-aws-glue-aws-dms-and-amazon-redshift/