apache-zookeeperapache-phoenixknox-gatewayapache-knox

How to load balance several phoenix query servers behind Knox gateway?


I have 3 phoenix query servers running behind a knox gateway (hiding kerberos auth complexity), accessed through Simba's odbc driver. I manage to reach one phoenix query server and launch queries through knox, by directly mapping, in topology file, avatica service to the internal ip address and port of one phoenix query server in my internal network. I would like to have knox randomly access either of my 3 phoenix query servers, not just one. Do you know if i can achieve this with zookeeper and how i can configure it to do this ?

I've already tried to make some loadbalancing bu making knox topology pointing on an nginx reverse proxy, setting as upstream my 3 PQS but i'm having a 401 error, likewise my credentials were transmitted trough the proxy

my odbc.ini file :

[phoenixovh]
Driver=/opt/hortonworks/phoenixodbc/lib/64/libphoenixodbc_sb64.so
Host=knox.<clusterid>.datalake.ovh
Port=443
AuthMech=2
UID=<user>
PWD=<password>
LogLevel=0
ConnectionSyncInterval=120
SSL=1
HttpPath=gateway/default/avatica
TransportMode=http

part of my knox topology file (working for 1 PQS) (default.xml)

<service>
  <role>AVATICA</role>
   <url> internal_address__and_port_of_url_of_one_pqs</url>
</service>


Solution

  • I finally managed to have my 3 PQS reached by following know ha guide (https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration), adding in my topology file an ha provider section and providing 3 urls in the service configuration instead of one:

     <provider>
        <role>ha</role>
        <name>HaProvider</name>
        <enabled>true</enabled>
        <param>
          <name>AVATICA</name>
          <value>maxFailoverAttempts=3;failoverSleep=1000;maxRetryAttempts=300;retrySleep=1000;enabled=true</value>
        </param>
      </provider>
    </gateway>
    
    ...
    
    <service>
     <role>AVATICA</role>
      <url>internal url of PQS1</url>
      <url>internal url of PQS2</url>
      <url>internal url of PQS3</url>
    </service>
    
    

    Knox guide mentions this way and also the zookeeper's connection string but does not provide any insights on which solution is better.