jdbchadoopcloudera

Connecting to Named Queue in Hadoop with JDBC


Have an install of Cloudera on AWS. Trying to get it setup so that it has multiple named queues and I can connect to the queues using JDBC and execute a query.

From what I have been able to gather so far, once the queues are there, connecting to them with JDBC is rather simple because it just has the format:

http://<server name>:<port>/<queue name>

However, it's not clear running around looking at the reams of different documentation how to set the queues up in the first place. Seems that if you have a hadoop-site.xml file, you go in there and add the property mapred.queue.name and a comma separated string. But Cloudera does not have that file. It does have a mapred-site.xml, but adding that property and then going to the command line and asking for a list of queues still just returned default.

Then we tried to use the FairScheduler, but it's the new yarn-based one that has the notion of balancing work between named queues.

So what I am looking for is:

  1. a way to just create 2 queues, e.g. Engineering and Marketing
  2. show that once I have them, I can connect using JDBC to either one
  3. and execute a query

After, I can worry about using ACLs to make the queues have different access to different parts of the data, and possibly manage access to the resources. For now, just looking to show that I can get at the data exposed through the named queues.


Solution

  • SO it turns out that you get named queues when you opt for a scheduler that uses them. This took a lot of research because in the first version of Hadoop, the FairScheduler used pools, not queues, and only the CapacityScheduler used queues. In Hadoop 2.x, the new FairScheduler has been redone to use queues. But that is still beta.