Going to rewrite this question because I've gotten a lot of updated information.
My issue is as follows:
I have an EMR cluster with 1 Master node and 1 Slave node. The Slave node is configured to have access to the open internet unfettered (I know this is a security risk).
When I setup this cluster with a bootstrap action which simply calls sudo yum -y update
, it fails, saying that the bootstrap action failed on the slave node (it always succeeds on master)
However, if SSH into the Slave node and manually try executing sudo yum -y update
, the operation succeeds on the 5.5.0 EMR package.
I am unable to debug further into why this happens because, despite to my best knowledge having configured it correctly, EMR does not copy any logs to S3 (the log copying is sporadic at best) and CloudWatch does not pick up any logs from the VPC, which makes debugging this issue quite obscure.
Any information would be appreciated.
Edit: I was able to get my CloudWatch VPC logs working (apparently my IAM didn't have the Trust Relationship to upload logs), and it shows a lot of REJECTs while the Master node doesn't seem to be showing any REJECTs. which makes me presume that there is some autoconfiguration that is going on and preventing me from properly download yum packages?
In the tradition of asking obscure questions and managing to resolve them on my own, let me share my mitigation.
It turns out this is a problem in the EMR-5.5.0 release label. Downgrading to EMR-5.3.0 fixed my script issues and now the script is executing normally as expected.
It seems like potentially the time/way the script is executed was changed in 5.5.0.