I am facing a persistent AccessDeniedException when trying to execute nested AWS Step Functions in my test environment. The same setup works perfectly in my Integration environment.
child-state-machine-1 and child-state-machine-2 are parallel steps. Giving error in both of these parallel steps.
{
"cause": "User: arn:aws:sts::1234556:assumed-role/SomeRole/xyz is not authorized to access this resource (Service: AWSStepFunctions; Status Code: 400; Error Code: AccessDeniedException; Request ID: SomeID; Proxy: null)",
"error": "StepFunctions.AWSStepFunctionsException",
"resource": "startExecution.sync",
"resourceType": "states"
}
Setup details:
I have a state machine (main-state-machine) that starts other state machines via states:StartExecution or states:StartSyncExecution.
The IAM role main-state-machine-role is used, and I have attached the following policy (see below) both in Terraform and via the AWS Console:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"states:StartExecution",
"states:StartSyncExecution"
],
"Effect": "Allow",
"Resource": [
"arn:aws:states:REGION:ACCOUNT_ID:stateMachine:child-state-machine-1",
"arn:aws:states:REGION:ACCOUNT_ID:stateMachine:child-state-machine-2",
"arn:aws:states:REGION:ACCOUNT_ID:stateMachine:child-state-machine-3"
]
}
]
}
Things I have checked:
The IAM policy is correctly attached to the role in the staging environment (verified in AWS Console).
The state machine ARNs in the policy exactly match the actual state machine ARNs.
There is no problem in development env (same policy and role setup, works fine).
What could be the cause of this AccessDeniedException in staging, even though the policy and ARNs are correct and the same setup works in integration?
ensure to check the role, does this needs additional permissions to execute state machines.
sample of configuration like below,
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"states:StartExecution",
"states:StartSyncExecution",
"states:DescribeExecution",
"states:StopExecution"
],
"Effect": "Allow",
"Resource": [
"arn:aws:states:REGION:ACCOUNT_ID:stateMachine:child-state-machine-1",
"arn:aws:states:REGION:ACCOUNT_ID:stateMachine:child-state-machine-2",
"arn:aws:states:REGION:ACCOUNT_ID:stateMachine:child-state-machine-3",
"arn:aws:states:REGION:ACCOUNT_ID:execution:child-state-machine-1/*",
"arn:aws:states:REGION:ACCOUNT_ID:execution:child-state-machine-2/*",
"arn:aws:states:REGION:ACCOUNT_ID:execution:child-state-machine-3/*"
]
}
]
}
also, Check for any Service Control Policies in your test environment this might be issue for failing to execute Step Functions actions. and i like to know how it works in integration but not in test, What could be the difference in this two,
then, Verify configuration of in your case, the main-state-machine-role.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "states.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
What things needs to be taken care and verified confirmed before proceeding further,
The REGION and ACCOUNT_ID in your policy match exactly?
The state machine names in the ARNs match exactly?
The role exists in the test environment?
There are no conflicting IAM policies? like shared between two aws services?