amazon-web-servicesamazon-ec2aws-data-pipeline

AWS IAM Setup for EC2 Resource in AWS Data Pipeline


I am having an issue getting AWS Data Pipeline to run on an EC2 Instance via a Shell Command Activity.

I have been following the guide found here step by step: https://medium.com/@SarwatFatimaM/data-scientists-guide-setting-up-aws-datapipeline-for-running-python-etl-scripts-using-c6c8fa4de70d

The primary issue I am running into is that the pipeline will hang on the WAITING_FOR_RUNNER Status. I have confirmed that my python script and .bat (had to change from .sh as I am using a windows ec2) run inside of the desired Ec2 instance. However, from what I can tell the issue is a result of the warning I am receiving from inside the Datapipline Architect:

Errors/Warnings
Object:DefaultResource1
WARNING: Could not validate S3 Access for role. Please ensure role ('DataPipelineDefaultRole') has s3:Get*, s3:List*, s3:Put* and sts:AssumeRole permissions for DataPipeline.

I have tried editing the IAM roles such that DataPipelineDefaultRole and DataPipelineDefaultResourceRole both have access to AmazonEc2FullAccess, AmazonS3FullAccess, AWSDataPipelineRole, AWSDataPipeline_FullAccess policies as well as trying the suggested inline policies shown here: AWS Data Pipeline: Issue with permissions S3 Access for IAM role and here https://forums.aws.amazon.com/thread.jspa?threadID=241048.

I have let these policies sit for hours and I have rebuilt the pipeline a few times but I still keep getting that specific warning. Do you have any ideas?


Solution

  • As per the AWS Data Pipeline documentation on AWS found below, the custom AMI must have Linux installed. This, therefore, cannot be completed currently on a Windows EC2 and must be completed on a Linux EC2.

    https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-ami.html