I am having an issue getting AWS Data Pipeline to run on an EC2 Instance via a Shell Command Activity.
I have been following the guide found here step by step: https://medium.com/@SarwatFatimaM/data-scientists-guide-setting-up-aws-datapipeline-for-running-python-etl-scripts-using-c6c8fa4de70d
The primary issue I am running into is that the pipeline will hang on the WAITING_FOR_RUNNER
Status.
I have confirmed that my python script and .bat (had to change from .sh as I am using a windows ec2) run inside of the desired Ec2 instance. However, from what I can tell the issue is a result of the warning I am receiving from inside the Datapipline Architect:
Errors/Warnings
Object:DefaultResource1
WARNING: Could not validate S3 Access for role. Please ensure role ('DataPipelineDefaultRole') has s3:Get*, s3:List*, s3:Put* and sts:AssumeRole permissions for DataPipeline.
I have tried editing the IAM roles such that DataPipelineDefaultRole and DataPipelineDefaultResourceRole both have access to AmazonEc2FullAccess, AmazonS3FullAccess, AWSDataPipelineRole, AWSDataPipeline_FullAccess policies as well as trying the suggested inline policies shown here: AWS Data Pipeline: Issue with permissions S3 Access for IAM role and here https://forums.aws.amazon.com/thread.jspa?threadID=241048.
I have let these policies sit for hours and I have rebuilt the pipeline a few times but I still keep getting that specific warning. Do you have any ideas?
As per the AWS Data Pipeline documentation on AWS found below, the custom AMI must have Linux installed. This, therefore, cannot be completed currently on a Windows EC2 and must be completed on a Linux EC2.
https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-custom-ami.html