resthadoophbasestargate

How do I use the REST api on an HBase cluster via load balancing


I have a pilot HBase cluster with 1 master and 5 slave nodes. I want to access (basically write ad impression data via GET's) the cluster via its REST API. I want to be able to run aggregated reports using Hadoop/Hive?Pig (TBD) later, so I want a single picture of the data.

Do I start the REST server on the master and just write to that single endpoint, or do I start a REST server instance on each slave node and load balance writes across the slave nodes?

(The latter doesn't seem right but I saw some mention in docs about that so am a little confused).


Solution

  • I use the rest api with load balancing provided through nginx. Your nginx config would look something like this...

    upstream cluster
    {
        server master:1234;
        server slave1:1234;
        server slave2:1234;
        server slave3:1234;
        server slave4:1234;
    }
    server
    {
        listen 4444;
        server_name someserver.com;
        location /
        {
            proxy_pass http://cluster;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_next_upstream error timeout invalid_header http_500 http_502 http_503;
        }
    }
    

    You would run on all servers in the cluster

    hbase rest -p 1234 start
    

    You would call someserver.com:4444 for your rest calls.