amazon-web-servicesamazon-emraws-security-groupprivate-subnet

Why does EMR in private subnet need full outbound internet access


AWS documentation on below link asks to allow full outbound internet access on EMR master security group for the cluster which is in private subnet.

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html#emr-sg-elasticmapreduce-master-private

However full outbound access poses risks. What is the rationale behind this full outbound internet access?


Solution

  • Below is what I could gather after a connect with AWS support:

    Outbound Rules on your Security Group is only applicable whenever your cluster nodes initiates new connections to external IP's (i.e., any IP's and not localhost/its own private IP). This is the reason why we provide unrestricted access to outbound connections as they are initiated by the node itself.

    It is important to understand that when your cluster is launching it needs to have connectivity to S3 to download necessary repos,upload/download logs, cluster information etc. Moreover, application provisioning phase in EMR consists successful configuration of a lot of internal services/components (such as Resource Manager, NameNode, Node Manager, DataNode etc), all of them operate on different random ports within the cluster itself, so it is necessary to allow all the TCP communications between the master and slave/core node security groups. Also master Node communicates over SSL for the majority of instance controller communications and other Cluster Manager components to configure necessary software and also exchange heart-beat signals , and thus 443 and 80 ports needs to be opened.

    In addition Hadoop talks to different application where each of them run on their own unique ports as well as different private IP address as Cluster adds or removes more nodes. So, we can not provide a list of specific ports that can be opened for cluster operations because the port and protocol requirements does vary depending on the applications that are configured on EMR cluster and the tasks on the EMR cluster might fail if nodes are not able to communicate with each other or any other external dependency on the desired ports which includes the ephemeral port range.

    Therefore, please note that the recommended configuration for managed security group egress rules is 0.0.0.0/0 especially during the cluster launch as restricting it could make EMR unable to download the applications required and thus end in Cluster provision failure.

    However, I understand that you are looking for minimum recommended settings to configure outbound rules on "Amazon EMR–Managed Security Groups" instead of 0.0.0.0/0(All traffic) as this may poses a security risk.

    It is highly advisable to not make any changes during the cluster launch. Even after launching the cluster, it might create an issue if the outbound security group rules aren't configured properly. You may update the security group rules after the cluster has been launched successfully. But below are the few things we need to consider here before doing so -

    Not advisable to make any changes during cluster launch:

    1. Allow all TCP, UDP, ICMP v4 traffic from master and slave node security group to each other so they can communicate successfully between each other.
    2. When using EMR depending upon your use case scenario EMR nodes will talk to endpoints of services like S3, Dynamo DB, VPC, KMS endpoints and to talk to AWS endpoints it would need outbound HTTP and HTTPS connectivity. So you should have S3 endpoint, DynamoDB endpoint, KMS endpoint etc. depending upon your use case attached to your VPC then allow the HTTP and HTTPS connectivity to those endpoints explicitly in the Outbound rules of both Master and Slave managed security group.
    3. All traffic should be allowed to s3 endpoint of the region you are running your cluster in your outbound rules to get the required packages from s3 for EMR.
    4. If you enable debugging in EMR, SQS endpoint CIDR ranges for that region should be added to outbound rules.
    5. If EMRFS consistent view is enabled, DynamoDB endpoint CIDR ranges should be added to outbound egress rules.
    6. In addition to this, depending upon your application specific requirement, you may add any other port as per the requirement.

    Hence you can try to restrict the outbound rules depending upon your use case scenario once the cluster is successfully launched and it can look like -

    Outbound rules for ElasticMapReduce-master configuration:

    Type        Protocol    Port Range  Destination
    HTTP        TCP         80          0.0.0.0/0
    HTTPS       TCP         443         0.0.0.0/0
    AllTraffic  TCP         0 - 65535   ElasticMapReduce-master security group ID
    AllTraffic  TCP         0 - 65535   ElasticMapReduce-slave security group ID
    

    Outbound rules for ElasticMapReduce-slave configuration:

    Type        Protocol    Port Range  Destination
    HTTP        TCP         80          0.0.0.0/0
    HTTPS       TCP         443         0.0.0.0/0
    AllTraffic  TCP         0 - 65535   ElasticMapReduce-master security group ID
    AllTraffic  TCP         0 - 65535   ElasticMapReduce-slave security group ID
    

    Note: AllTraffic includes All TCP, UDP, ICMP v4 to slave node security group and master node security group. For any other application specific requirement, you may add any other port as per the requirement.

    Also please note that pinpointing exactly which routes are required in an EMR cluster is a very difficult process because there are so many moving parts which is why it's not recommended. We cannot outline exact what are the rules you require for your specific cluster because every cluster is different depending on the applications and integrations used. If you absolutely require to do this, you'll need to enable VPC flow logs on all ENIs in your EMR subnet and go through them using CloudWatch Logs Insights or Athena (if you're pushing to S3).

    I would strongly recommend to please test changing the security group configurations in the development environment first before doing it in production.