I have a pilot HBase cluster with 1 master and 5 slave nodes. I want to access (basically write ad impression data via GET's) the cluster via its REST API. I want to be able to run aggregated reports using Hadoop/Hive?Pig (TBD) later, so I want a single picture of the data.
Do I start the REST server on the master and just write to that single endpoint, or do I start a REST server instance on each slave node and load balance writes across the slave nodes?
(The latter doesn't seem right but I saw some mention in docs about that so am a little confused).
I use the rest api with load balancing provided through nginx. Your nginx config would look something like this...
upstream cluster
{
server master:1234;
server slave1:1234;
server slave2:1234;
server slave3:1234;
server slave4:1234;
}
server
{
listen 4444;
server_name someserver.com;
location /
{
proxy_pass http://cluster;
proxy_set_header X-Real-IP $remote_addr;
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503;
}
}
You would run on all servers in the cluster
hbase rest -p 1234 start
You would call someserver.com:4444 for your rest calls.