javaaws-sdkemramazon-emr

AWS EMR - Get master node ip from java code


I want to implement the following flow from Java code:

  1. Create a new AWS EMR instance (using AWS SDK)
  2. Connect to the AWS EMR using the Hive JDBC (required IP)
  3. Run my "SQL" queries on the EMR
  4. Destroy the AWS EMR (using AWS SDK)

My problem is that when I create an EMR using the SDK I can only retrieve the AWS id of it, something like j-XXXXXXXXXXX. But in order to connect to the JDBC I need the master node IP. How can I obtain the master node IP from the code?

I'm following this JDBC example page

==UPDATE==
I tried using the AmazonElasticMapReduceClient.describeCluster but could only obtain the public DNS name while I'm looking for the private ip.


Solution

  • AFAIK there is no direct way to get it, but it can be achieved using 2 API calls and searching among them:

    public String getMasterNodeIp(AmazonElasticMapReduceClient emr, String emrId) throws Exception {
        Cluster cluster = emr.describeCluster(new DescribeClusterRequest().withClusterId(emrId)).getCluster();
        ListInstancesResult instances = emr.listInstances(new ListInstancesRequest().withClusterId(emrId));
        String masterDnsName = cluster.getMasterPublicDnsName();
        for (Instance instance : instances.getInstances()) {
            if (instance.getPublicDnsName().equals(masterDnsName)) {
                return instance.getPrivateIpAddress();
            }
        }
        throw new Exception("Failed to find master node private ip.");
    }