I've been trying to create some infrastructure that includes bunch of services like EC2, ECS, S3 and Batch (few more). Everything seems to be fine, till it reaches the step to build the batch process.
I was following a medium blog and here's the CF template: Github Repo Link
This YAML is outdated and I have made some modifications here and there, but not the ones with roles.
I've had more than 3 CloudFormation stacks stuck in roll back because it can't stabilise the Compute Environment it builds from the YAML config I have. I reached out to Compute Environment to see the exact error and this is what I get:
DELETING - CLIENT_ERROR - User: batch.amazonaws.com is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::402726478692:role/service-role/AWSBatchServiceRole (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: f9d6c19d-4e77-4814-ac2c-b437e0546977; Proxy: null)
Now, It won't even delete this compute environment on automated rollback. But, my main concern is why is it not able to create? I've gone through documentation and few questions here regarding the same topic, but nothing seemed to work.
Here's the excerpt from my YAML config. This part is for compute environment:
ComputeEnvironment:
Type: "AWS::Batch::ComputeEnvironment"
Properties:
Type: MANAGED
ServiceRole: !Sub "arn:aws:iam::${AWS::AccountId}:role/service-role/AWSBatchServiceRole"
ComputeEnvironmentName: !Sub "${Environment}-batch-processing_3"
ComputeResources:
MaxvCpus: 1
SecurityGroupIds:
- !Ref SecurityGroup
Type: EC2
Subnets: !Ref Subnets
MinvCpus: 1
InstanceRole: !Ref ECSInstanceProfile
InstanceTypes:
- "c6gd.medium"
Tags: {"Name": !Sub "${Environment} - Batch Instance" }
DesiredvCpus: 1
State: ENABLED
JobQueue:
DependsOn: ComputeEnvironment
Type: "AWS::Batch::JobQueue"
Properties:
ComputeEnvironmentOrder:
- Order: 1
ComputeEnvironment: !Ref ComputeEnvironment
State: ENABLED
Priority: 1
JobQueueName: "HighPriority"
Job:
Type: "AWS::Batch::JobDefinition"
Properties:
Type: container
JobDefinitionName: !Sub "${Environment}-batch-s3-processor"
ContainerProperties:
Memory: 2048
Privileged: false
JobRoleArn: !Ref JobRole
ReadonlyRootFilesystem: true
Vcpus: 1
Image: !Sub "${AWS::AccountId}.dkr.ecr.us-west-2.amazonaws.com/${DockerImage}"
RetryStrategy:
Attempts: 1
JobRole:
Type: "AWS::IAM::Role"
Properties:
Path: "/"
RoleName: !Sub "${Environment}-BatchJobRole"
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Action:
- "sts:AssumeRole"
Effect: "Allow"
Principal:
Service:
- "ecs-tasks.amazonaws.com"
- "batch.amazonaws.com"
Policies:
-
PolicyName: !Sub "${Environment}-s3-access"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "s3:*"
- "iam:*"
- "batch:*"
Resource: !Sub "arn:aws:s3:::batch-${AWS::AccountId}-${AWS::Region}/*"
ECSInstanceProfile:
Type: "AWS::IAM::InstanceProfile"
Properties:
Path: "/"
Roles:
- !Ref ECSRole
ECSRole:
Type: "AWS::IAM::Role"
Properties:
Path: "/"
RoleName: !Sub "${Environment}-batch-ecs-role"
SourceAccount:
Ref: AWS::AccountId
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
-
Action: "sts:AssumeRole"
Effect: "Allow"
Principal:
Service:
- "ec2.amazonaws.com"
- "batch.amazonaws.com"
Policies:
- PolicyName: !Sub "${Environment}-full-access-for-batch-resource"
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "s3:*"
- "iam:*"
- "batch:*"
Resource: !Sub "arn:aws:s3:::batch-${AWS::AccountId}-${AWS::Region}/*"
- PolicyName: !Sub ${Environment}-ecs-batch-policy
PolicyDocument:
Version: "2012-10-17"
Statement:
-
Effect: "Allow"
Action:
- "ecs:CreateCluster"
- "ecs:DeregisterContainerInstance"
- "ecs:DiscoverPollEndpoint"
- "ecs:Poll"
- "ecs:RegisterContainerInstance"
- "ecs:StartTelemetrySession"
- "ecs:StartTask"
- "ecs:Submit*"
- "logs:CreateLogStream"
- "logs:PutLogEvents"
- "logs:DescribeLogStreams"
- "logs:CreateLogGroup"
- "ecr:BatchCheckLayerAvailability"
- "ecr:BatchGetImage"
- "ecr:GetDownloadUrlForLayer"
- "ecr:GetAuthorizationToken"
- "s3:*"
- "batch:*"
Resource: "*"
- PolicyName: !Sub "${Environment}-ecs-instance-policy"
PolicyDocument:
Statement:
-
Effect: "Allow"
Action:
- "ecs:DescribeContainerInstances"
- "ecs:ListClusters"
- "ecs:RegisterTaskDefinition"
- "s3:*"
- "batch:*"
Resource: "*"
-
Effect: "Allow"
Action:
- "ecs:*"
- "s3:*"
- "batch:*"
Resource: "*"
As you can see I've tried giving more than enough permissions in these policies which is already a bad practice, but I still can't get it to Assume Role. Any help would be appreciated.
EDIT: I have checked and I can see the AWSBatchServiceRole
and I have added AWSBatchServiceRole
and AWSBatchFullAccess
permissions to it and in the Trust Relationship
, I do have Sts:AssumeRole
in there. This is the JSON from Trust Relationship
:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "batch.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
One of my friend figured it out and it worked. It was a dumb mistake.
Changed arn:aws:iam::${AWS::AccountId}:role/service-role/AWSBatchServiceRole
to arn:aws:iam::${AWS::AccountId}:role/AWSBatchServiceRole
and it worked.
service-role/
isn't required, at least not now.